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Multidimensional  Modulation  and  Coding 
for  Band-Limited  Digital  Channels 
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Abstract — A  class  of  multidimensional  signals,  based  on  wtial  we  call 
^itRTaJi/ed  ^»oup  alphabets,  is  introduced,  and  its  basic  properties  are 
derived,  live  combination  of  generalized  group  alphabets  and  coding  is  also 
examined:  two  coding  schemes  are  considered  —  Ungerboeck’s  scheme  for 
combination  with  convolutional  codes,  and  Ginzburg's  scheme  for  combi¬ 
nation  with  block  codes.  The  performance  of  these  schemes  makes  them 
attractive  for  transmission  over  band-limited  digital  channels. 

I.  Introduction 

N  DIGITAL  RADIO  communications  both  the  avail¬ 
able  spectrum  and  the  transmitter  power  are  limited. 
Thus  to  cope  with  the  ever-increasing  demand  for  more 
efficient  transmission,  new  modulation  techniques  are 
needed.  One  way  to  increase  the  transmission  efficiency, 
suggested  by  Shannon’s  fundamental  theorem  itself,  is  to 
increase  the  dimensionality  of  the  signal  space  [1],  [12],  For 
this  solution  to  have  practical  applications,  however,  the 
system  complexity  should  not  increase  prohibitively.  Con¬ 
ventional  systems,  like  quadrature  amplitude  modulation 
(QAM)  and  phase-shift  keying  (PSK),  use  two-dimensional 
signals  obtained  through  the  inphase  and  quadrature  com¬ 
ponents  of  a  sinusoidal  carrier.  Four-dimensional  signal 
spaces  can  be  realized  in  a  similar  way  by  simultaneously 
using  two  channels,  each  with  separately  modulated  in- 
phase  and  quadrature  components.  The  two  bandpass 
channels  can  be  two  orthogonally  polarized  electromag¬ 
netic  waves,  or  time-division  or  frequency-division  multi¬ 
plexed  signals  transmitted  on  a  common  medium.  Results 
on  specific  designs  of  four-  and  eight-dimensional  signal 
sets  can  be  found  in  [2]-[6] 

We  consider  a  structured  class  of  multidimensional  al¬ 
phabets  which  we  call  “generalized  group  alphabets”  that 
are  based  on  a  peak-energy  constraint.  They  generalize  the 
“group  codes”  of  Slepian  [7]  that  are  based  on  an  equal- 
energy  constraint  The  most  striking  feature  of  these  al- 
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phahets  is  that  they  exhibit  a  considerable  degree  of  sym¬ 
metry. 

Generalized  group  alphabets  form  a  large  class  of  codes; 
to  date,  most  of  the  good  alphabets  that  have  been  pro¬ 
posed  for  multidimensional  signaling  belong  to  this  family. 
After  a  description  of  the  main  features  of  these  alphabets, 
we  show  how  they  can  be  used  in  conjunction  with  error- 
control  codes.  For  this  purpose  the  alphabets  must  be 
partitioned  into  a  chain  of  subsets,  where  the  minimum 
distance  between  subsets  increases  with  depth.  The  con¬ 
cept  of  a  fair  partition  is  introduced,  and  it  is  showm  how  it 
can  be  obtained  through  the  action  of  a  group  of  orthogo¬ 
nal  matrices  on  a  set  of  vectors.  The  method  of  dividing  a 
signal  alphabet  into  subsets  via  the  action  of  an  orthogonal 
group  is  due  to  Ginzburg  [8],  Finally,  we  provide  some 
examples  of  actual  designs  that  show  how  our  techniques 
can  be  applied  to  generate  codes.  However,  no  attempt  has 
been  made  to  discover  optimum  codes. 

II.  Generalized  Group  Alphabets 

Consider  a  set  of  K  n-dimensional  vectors  X  = 

{  A",,-  •  •,  XK },  called  the  initial  set,  and  L  orthogonal 
n  X  n  matrices  S,,-  •  • ,  SL  that  form  a  finite  group  G  under 
multiplication. 

Definition  I:  The  set  of  vectors  GAT,,  GX2,-  ■  -  ,GXK  ob¬ 
tained  from  the  action  of  G  on  the  vectors  of  the  initial  set 
is  called  a  generalized  group  alphabet  (GGA).  G  is  called 
its  generating  group. 

Definition  2:  A  GGA  is  called  separable  if  the  vectors 
of  the  initial  set  are  transformed  by  G  into  either  disjoint 
or  coincident  vector  sets,  i.e., 

( 0,  j  A  k 

GXf  n  GXk  -  |  J  =  k 

If  1|  A"]!  denotes  the  F.uclidean  length  of  a  vector  A’,  the 
quantity  ||A'||2  is  proportional  to  the  energy  of  the  signal 
associated  with  A'  for  transmission  over  a  continuous 
channel.  Since  an  orthogonal  matrix  transforms  a  vector 
into  one  with  the  same  length,  the  signals  associated  with  a 
GGA  have  as  many  energy  levels  as  there  are  in  the  initial 
set.  The  special  case  of  a  GGA  with  K  -  1,  and  hence  only 
one  energy  level,  was  extensively  studied  in  (7) 

Definition  2:  A  GGA  is  called  regular  if  the  number  of 
•.-CG-i..  ...  suhalpiidhci  <iA’y,  j—  1.---.A.  docs  not 
depend  on  /,  i.e.,  each  vector  of  the  initial  sei  is  trans- 
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formed  by  G  into  the  same  number  of  distinct  vectors.  A 
regular  GGA  is  called  strongly  regular  if  each  set  GX 
contains  exactly  L  distinct  vectors. 

The  following  result  follows  directly  from  the  defini¬ 
tions. 

Proposition  1:  The  number  M  of  vectors  in  a  regular 
GGA  is  a  multiple  of  A'.  If  GGA  is  strongly  regular,  tlien 
M  -  A/. 

Next  we  exhibit  four  examples  of  these  alphabets.  Notice 
that  for  A'  =  1  every  GGA  is  regular,  but  not  necessarily 
strongly  regular  [7],  [16] 

Alphabet  I  ( Asymmetric  M-PSK:  Two  Dimensions,  One 
Energy  Level):  Choose  an  initial  vector  X  =  (cos  d, sin  d), 
d  a  given  constant,  an  integer  M  =  2M,  and  consider  the 
group  of  2x2  orthogonal  matrices  of  the  form  R'T‘, 
i  —  0, 1,-  ■■ ,  M  -  1,  j  =1,2,  where 


cos  (27 r/M)  sin(27r/M) 
-s\n(l-n/M)  cos(2ir/M) 

and 

T=f°  1 
ll  0. 


It  is  seen  that  the  effect  of  A  on  a  two-dimensional  vector 
is  to  rotate  it  by  an  angle  2-n/M,  and  the  effect  of  T  is  to 
exchange  its  components.  This  group  has  2 M  elements  and 
gives  rise  to  a  separate  alphabet  of  M  or  2  M  vectors, 
according  to  the  choice  of  the  initial  vector.  Notice  that  the 
alphabet  is  strongly  regular  only  when  it  has  2 M  elements 
(asymmetric  A/-PSK.  [13],  [14]). 

Alphabet  2  (Four  Dimensions,  One  Energy  Level):  Con¬ 
sider  the  group  of  matrices  which  act  on  a  four-dimen¬ 
sional  initial  vector  by  permuting  its  components  and 
replacing  them  with  their  negatives.  This  group  has  4!24 
elements.  If  the  initial  vector  is  AT,  =  (a,  a,  a,0),  a  =  l/fh  , 
the  resulting  (separable)  alphabet  has  M  =  32  distinct 
unit-energy  vectors  (see  Fig.  1). 
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Fif>  I  Alphabet  2  and  ils  fair  partition 


Alphabet  5  (Two  Dimensions,  Three  Energy  levels):  Our 
third  example  is  shown  in  Fig.  2  Points  1,  2,  3,  and  4 
denote  lb.  Gur  vectors  in  the  initial  set.  The  matrices 
generating  the  code  are  those  associated  to  nlane  rotations 
'  v  multiples  i.f  „/2.  Tile  resulting  (stiongly  regular,  sep¬ 
arable)  alphabet  is  the  conventional  16-QAM. 

Alphabet  4  (hour  Dimersions,  Two  Energy  levels):  Fills 
alphabet  which  has  two  energy  levels,  A  -  4,  and  M  =  128. 


Fig.  2.  Alphabet  3  and  its  fair  paiiition. 


is  obtained  from  the  initial  set  of  vectors 

c  c  c  0 

—  b  c  c  0 

c  —  b  c  0 

c  c  —  b  0 

with  c  =  0.389  and  b  =  0.939.  If  we  apply  to  this  initial  set 

the  same  matrix  group  which  generates  Alphabet  2,  we  get 
a  separable  alphabet  with  128  vectors  (see  Fig.  3).  Among 
them,  32  have  energy  3c2,  and  96  have  energy  b2  +  2c1. 
The  average  energy  is  1. 

We  consider  now  some  distance  properties  of  the  ele¬ 
ments  of  a  GGA.  Choose  a  partition  of  it  into  m  subsets 
Z,,  Z2,-  •  •,  Zm.  For  each  subset  Z,,  we  can  define  the 
intradistance  set  as  the  set  of  all  the  Euclidean  distances 
among  pairs  of  vectors  in  Z,.  For  any  pair  of  distinct 
subsets  Z.,  ZJt  we  define  their  interdistance  set  as  the  set  of 
all  the  Euclidean  distances  between  a  vector  in  Z,  and  a 
vector  in  Z-. 

Definition  4:  The  partition  of  a  separable  GGA  into  m 
subsets  Zv---,Zm  is  called  fair  if  all  the  subsets  are 
distinct,  include  the  same  number  of  vectors,  and  their 
intradistance  sets  are  equal. 

We  shall  now  exhibit  a  constructive  method  to  generate 
fair  partitions  of  a  GGA.  Consider  the  generating  group  G 
of  the  GGA,  one  of  its  subgroups,  say  //,  and  the  partition 
of  G  into  left  cosets  of  H.  We  have  the  following  result. 

Theorem  l:  If  the  left  cosets  of  the  subgroup  //  are 
applied  to  the  initial  set  of  a  strongly  regular  GGA,  this 
procedure  results  into  a  fair  partition  of  the  GGA.  Under 
the  same  hypotheses,  if  U  is  a  normal  subgroup,  then  left 
and  right  cosets  give  rise  to  the  same  fair  partition. 

Proof:  Let  S  denote  an  element  of  Gg  not  belonging 
to  II,  and  SH  the  corresponding  left  coset.  If  X,,  Xf  are 
two  (not  necessarily  distinct)  vectors  of  the  initial  set,  and 
Sh,  ,SA  are  two  elements  of  //,  the  intradistance  set  associ¬ 
ated  with  the  coset  SH  include  .lie  o^anhiCj 

d*{S.Sk,Sk)  -±\\sshxl-sskx.\\1 

as  Sh,  St  run  through  //,  and  X,,  Xf  run  through  the  initial 
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Fig.  3.  Alphabet  4  and  its  fair  partition. 


vector  set.  We  have 

dfj{S,  Sh,  Sk)  =  ||X,||2  +  lXmi-2Xj'SlSTSSkXl 

=  nxjtf+m'l-2xj‘siskxi 

where  the  superscript  T  denotes  transpose. 

As  the  right  side  of  the  last  equation  does  not  depend  on 
S,  we  have  shown  that  the  intradistance  set  associated  with 
the  left  cosets  of  //  are  independent  of  the  coset.  More¬ 
over,  if  U  is  normal,  then  right  cosets  and  left  cosets  give 
rise  to  the  same  fair  partition:  in  fact,  normality  implies 
that  for  every  S 

SH  =  US. 

What  makes  Theorem  1  work  is  the  fact  that  the  or¬ 
thogonal  matrices  form  a  group  of  isometries.  Hence  a 
more  abstract  formulation  is  possible,  extending  to  non- 
finite  groups.  As  pointed  out  by  the  editor,  lattices  and 
sublattices  equipped  with  isometric  transformations  (trans¬ 
lations)  fit  this  more  general  approach.  However,  for  our 
presentation  we  choose  the  framework  that  was  fruitfully 
used  for  the  description  of  “group  codes  for  the  Gaussian 
channel,”  and  that  was  based  on  finite  groups  of  orthogo¬ 
nal  matrices  [7|  (see  also  (8j). 

The  condition  of  strong  regularity  of  the  GGA  can  be 
removed,  but  in  this  case  it  may  happen  that  different 
cosets  generate  the  same  element  of  the  partition.  Hence 
some  of  the  coscts  must  be  removed  from  consideration. 
Moreover,  notice  that  if  II  is  a  normal  subgroup  of  G. 
then  we  do  not  need  to  distinguish  between  left  or  right 
co.set  partitions.  On  the  contrary,  if  II  is  not  normal,  the 


partitions  obtained  from  right  cosets  may  not  be  fair,  as 
shown  by  the  following  counterexample. 

Example  1:  Let  us  consider  the  four-dimensional  al¬ 
phabet  generated  by  the  action  of  the  natural  matrix 
representation  of  the  permutation  group  SA  on  the  initial 
vector  (-3d/2,-d/2,d/2,3d/2),d  a  constant.  Let  us 
consider  the  partition  induced  by  the  subgroup  H  of  the 
matrices  leaving  invariant  the  fourth  component  of  the 
initial  vector.  This  subgroup  is  isomorphic  to  S3.  The  left 
and  right  coset  partitions  associated  with  H  are  shown  in 
Table  I.  It  can  be  seen  that  the  partition  associated  with 
right  cosets  is  not  fair  because  its  intradistance  sets  are  not 
equal. 

In  some  cases,  we  are  interested  in  further  partitioning 
every  element  Z,  into  the  same  number  of  subsets.  We  are 
led  to  the  concept  of  a  chain  partition.  This  concept  is  also 
found  in  the  work  of  Ungerboeck  [10]  and  Ginzburg  [8]. 

Definition.  5:  The  chain  partition  of  a  separable  GGA  is 
called  fair  if  any  two  elements  of  the  partition  at  the  same 
level  of  the  chain  include  the  same  number  of  vectors  and 
have  equal  intradistance  sets. 

For  fair  chain  partitions  we  have  the  following  theorem, 
whose  proof  is  straightforward  and  will  be  omitted. 

Theorem  2:  Consider  a  strongly  regular  GGA  and  a 
chain  of  subgroups  of  its  generating  group  G.  that  is, 

//,c//2c//3c  c  Ilt  =  G 

Use  If  |  and  its  left  cosets  to  generate  a  partition  of 
GGA.  Then  use  and  its  left  coscts  in  If  to  further 

partition  all  the  sets  of  the  previous  partition  Repeat  the 
procedure  with  If  _2,  and  so  on,  until  If  and  its  left 
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TABLE  I 

1  1 1  i  a..i;  Right  Cosri  I’artuions  oi  a  CiCiA 
Lc  f  I  C'oset  Partition  Righl  Coscl  Partition 


(  id/2.  -  d /2.  d /2,  id /2) 

<  -  d/2.  -  id/2.d/2.id/2) 
(d/2.-d/ 2.  id/2.id/2) 

(  34/2.4/2,  -  d/2M/2 ; 

(  d/2,  d/2.  -  34/2.34/2) 

( d/2,  -  id/2.  4/2,34/2) 

O d/2,  id/2.  -  d/2.J/2) 

(3 d/2,  d/2.  -  3 d/2.d/2) 

(34/2,4/2.  d/2.  -  3 d/2) 

(34/2,  -  id/2.  d/2.  -  d/2) 
(id/2.  -d/2,d/2.  -id/2) 

/■»•/-*/  y-1  1  J  n  i  "** 

4.,  U,  jJ/-. 

(-  2d/2.2d/2.  -  d/2,d/2) 
(-  4/2,34/2,  -  id/2,  d/2) 
(d/2,2d/2,  -  d/2,  -  id/2) 
(-34/2,34/2,4/2, -4/2) 
(-  4/2,34/2,4/2. -34/2) 
(d/2.id/2,  -id/2.-d/2) 

(-7d/2.-d/2,id/2,d/2) 
<-  d/2,  - 7d/2,id/2.d/l ) 
(d/2.  -  d/2.id/2,  -  id/2) 
(-id/7,d/2,id/2,-d/2) 
(- d/2,  d/2,id/2. -id/2) 
(d/2.  -id/2,id/2,-d/2) 


(  id/2.  -  d/7.d/~>M/l) 

(  d/2.  -  id/2.  dp.  id/2) 

(d/2.  -  dp..  34/2.34/2) 

(  34/2,4/2.  -  d/2.id/2) 

(  -  d/2,J/2.  -  id/2.id/2) 
(d/2,  -  id/2.  -  d /2,  id/2) 

(id/2.-d/2.d/l.-id/2) 

(  -  d/2.id/2.d/2.  - id/2) 
(d/2.  -  d/2.id/2.  -  id/2) 
(id/2,  d /2,  -  d/2,  -  id /2) 
(-d/2,  d/2,id/2.-id/2) 
(4/2,34/2, -4/2. -34/2) 

(-id/2,id/2.d/2.-d/2) 

(  -  d/7,  -  id/2,  d/7,  -  d/7) 
(d/2,id/2.-id/2.-d/2) 
(-id/2,  d/7,id/7,  -d/7) 
(id/2,d/2.-id/2.-d/2) 
(d/2.-id/2,id/2,-d/2) 

( -  id/2,  -  d/2,  id/2,  d/7) 
(- d/2,-id/2.id/2,d/7) 
(id/2,  -  d/7,  -  id/2,  d/7) 

( -id/2,  id/2,  -d/2,d/2) 
(-  d/7,id/7,-7d/7.d/7) 
(id/7,-7d/2,-d/7,d/7) 


cosets  in  "2  are  used.  The  resulting  chain  partition  of 
GGA  is  fair. 

A  theorem  concerning  the  interdistance  sets  sheds  some 
further  light  on  the  symmetry  properties  of  GGA’s. 

Theorem  3:  Let  H  be  a  normal  subgroup  of  G.  The 
partition  of  a  strongly  regular  GGA  obtained  by  applying 
the  left  cosets  of  H  to  the  initial  set  X  has  the  following 
property.  The  interdistance  set  associated  with  any  two 
cosets,  say  S2/f  and  S2//,  is  a  function  only  of  the  coset 
S}If,  where  53  =  S[S2,  and  not  of  Su  S2  separately. 

Proof:  Let  5,  and  S2  denote  two  coset  leaders.  If 
X,,  X  are  two  (not  necessarily  distinct)  vectors  of  the 
initial  set  X,  and  Sh,Sk  are  two  elements  of  H,  the 
distances  among  elements  of  the  cosets  S^Il  and  S2fi 
includes  the  quantities 

d„  ( S, ,  S2 ,  S„ ,  Sk )  =  ||S, 5,  Xj  -  SA  X,\\ 

as  Sh,  .S’,  run  through  II  and  A'(,  X  run  through  X.  Wc 
have 


d;, ( -VI .  S.  ■  ‘V* .  -V* )  II  a;ii  2  4  II  JC.II 2  -  2  L/S/S/SA  Jf, 
-  II^IIJ  4  II T, II 2  -2XjSlSySkX,. 
finally,  as  //  is  a  normal  subgroup,  we  have 
S,//S,//  -  S,S2//  =  S,//; 


1  e.,  S,//  is  another  coset. 

We  now  provide  some  examples  of  fair  partitions  of  a 
GGA.  Consider  first  the  rotation  group  which  generates 
Alphabet  3  (see  lig  2)  and  its  partition  into  the  two  cosets 
associated  vuth  the  rotations  0 ,-rr.  and  tt/2,  -  n/2.  respec¬ 


tively.  The  GGA  is  fairly  partitioned  into  the  two  subal¬ 
phabets  (1,2,3,4,9,10.11,12)  and  (5,6,7,8,13,14,15.16). 

Lig.  1  shows  a  fair  partition  of  the  Alphabet  2  in  four 
subsets  of  eight  vectors  each.  This  partition  is  obtained  as 
follows:  denote  by  «  the  orthogonal  matrix  whose  effect 
on  a  vector  is  to  cyclically  shift  its  components  to  the  right 
by  one  position  and  to  change  sign  to  the  second  compo¬ 
nent.  Then  the  set 

ll~  {  a°,  a' ,  a2,  a1,  a4,  a5,  a6,  a7 } 

is  a  cyclic  normal  subgroup  of  the  group  G  generating  the 
alphabet,  and  its  cosets  generate  the  fair  partition. 

A  fair  partition  of  Alphabet  4  into  16  subsets  of  eight 
vectors  each  stems  from  the  subgroup  (/,-/},  where  I  is 
the  4x4  identity  matiix  (see  Fig.  3).  A  fair  partition  of 
Alphabet  l  is  obtained  by  considering  the  two  cosets  of  the 
subgroup  (7?,},-o'- 

Definition  6:  Let  R  be  a  left  coset  of  G  in  the  fair 
partition  of  a  GGA  and  Sg  an  element  of  G.  We  define  the 
distance  profile  [15]  associated  with  R  and  Sg  as  the 
polynomial  in  the  indeterminate  n>: 

F(w,Sg,R)±Y,°(d2)wd'- 

d 1 

where  a(d 2)  is  the  number  of  elements  of  RX  that  have 
squared  distance  d2  with  respect  to  an  element  of  the  set 
SgRX.  Note  that  a  given  element  of  RX  may  be  accounted 
for  more  than  once  as  it  contributes  with  different  squared 
distances  with  respect  to  different  elements  of  the  set 
S^RX.  The  sum  of  a(d2)  equals  the  square  of  the  cardinal¬ 
ity  of  RX. 

Example  2:  Consider  K  - 1,  X{  =  (l,0)r,  and  the  group 
of  plane  rotations 

cos  (177/2)  sin  ( iTr/2)  . 

S,=  .  .  ,  ,  ,  ,  <  =  0,1, 2, 3. 

[— sin(/77/2)  cos(/t7/2) 

The  subgroup  { S0,  S2 }  is  normal.  The  distance  profiles  are 
summarized  in  Table  II. 


TABLE  11 

Distance  Profiles  for  Example  2 


R 

F(w,St.R) 

(-V-L) 

$> 

2»°  4  2u 4 

\S,.s2) 

S, 

4  k-1 

(•S»A) 

.S', 

2k°  4  2  s  4 

( V-L) 

■S, 

4h* 

( -v. .  -s/ } 

-5, 

l  K  -f  /  h 

l-V-s-,} 

s, 

L) 

•S, 

2h° 4  2 k4 

f-V-b) 

-S', 

4w‘ 

Definition  7:  A  fair  partition  of  a  GGA  is  called  homo¬ 
geneous  if  the  set  {  T(w,S,  R)}  Sffi  does  not  depend  on  R. 
It  is  called  strongly  homogeneous  if  /  (  h  ,  .V,  R )  d<VS  not 
depend  on  R  for  any  ,V. 

Theorem  4:  if  G  is  a  commutative  group,  all  the  parti 
lions  generated  by  its  subgroups  are  strongly  homoge¬ 
neous 
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Proof:  Let  1 1  be  a  subgroup  of  G:  ibis  is  obviously 
normal  so  that  the  partition  induced  by  //  is  fair.  Lei 
A',,  X /  be  two  elements  of  the  initial  set  X,  S  an  element 
of  <7,  S„  an  element  of  //.  Then  for  any  Sf  £  G  the 
computation  of  F{w,S  ,SH)  involves  enumerating  the 
squared  distances 

P.S„A',  -  S'SS{HXy  =  ||SS„*,  -  SSf£1##Agi2 
=  IIP)  A',  -  -S;.Vl;/A|| ’ 

which  do  not  depend  on  S  and  hence  on  the  element  of 
the  fair  partition. 

Theorem  5:  If  H  is  a  subgroup  of  G  in  a  strongly 
regular  CJGA,  the  partition  generated  by  the  left  cosets  of 
//  is  homogeneous. 

Proof:  Let  //  be  a  subgroup  of  G.  Then  the  partition 
induced  by  the  left  cosets  of  H  is  fair.  Let  X,,  Aj  be  two 
elements  of  the  initial  set,  5  an  element  of  G,  S „  and  SIH 
two  elements  of  //.  Then  for  any  Sg<=G  the  computation 
of  F(w,S  ,SH)  involves  enumerating  the  squared  dis¬ 
tances 

lispp,  -  SgSSlHX,\\2  =  \\SHX,  -  STSgSSu<XJ ||2 

=  \\s„x,  -  s;slHXj\\2 

so  that  F(w,  Sg,  Sff)  =  F(w,  S',  SH),  and  as  Sg  runs 
through  G  so  does  S' -  STSgS.  Thus  the  assertion  is 
proved. 

III.  Multidimensional  Coded  Signals: 

Block  Codes 

We  shall  now  see  how  the  multidimensional  alphabets 
described  in  the  previous  section  can  be  used  in  conjunc¬ 
tion  with  codes  to  further  enhance  their  performance.  In 
this  section,  we  shall  focus  our  attention  on  block  codes, 
while  the  next  section  will  be  devoted  to  convolutional 
(trellis)  codes. 

Hirnai  and  Hirakawa  [18]  and  Ginzburg  [8]  have  de¬ 
scribed  constructions  which  make  it  possible  to  design 
alphabets  with  an  arbitrary  signal  distance  and  with  a 
regular  structure  by  employing  algebraic  properties  of  block 
codes,  big  4  shows  Ginzburg’s  construction.  The  /.  block 
encoders  (j.C,,-  •  -  ,C,  accept  source  symbols,  and  out  pic 
/.  blocks  ( r/|, ■*/?). '  ' '  •  <?,v,  )•  i  ~  1>'  ••,/-,  of  N  symbols  each 
The  modulator  /  maps  each  b-tuple  ).  /  r" 


1 ,  •  •  • ,  /V,  into  the  vector 

chosen  from  a  set  A  of  M  =  A#,.- - M,  elements.  This 
mapping  is  obtained  as  follows.  In  the  set  A  we  define  a 
system  of  L  partitions  such  that  each  class  of  the  / th 
partition  includes  M,  classes  of  the  (/-l)th  partition  so 
that  it  will  consist  of  M(l)  =  MyM2-  ■  ■  M,  signals.  By 
numbering  the  classes  of  the  (/  —  1  )th  level  occurring  in  a 
class  of  the  /th  level  we  can  obtain  a  one-to-one  mapping 
of  the  set  of  classes  of  the  (  /  -  l)th  partition  onto  the  set  of 
integers  {0, ••-,  Mt  —  1 }.  Therefore,  if  qtJ  are  chosen  in  the 
set  (0,-  -  •,  Af,-1),  /  =  1,-  •  L,  any  L-tuple  (<?,,, •  •  -,q/L) 
defines  a  unique  value  of  the  yth  elementary  signal  x  = 

Fis-  5>- 


f-ig  5.  Example  of  Ginzburg  construction. 


Ginzburg  proved  that  the  alphabet  obtained  in  this  w'ay 
has  a  minimum  squared  Euclidean  distance  D2  that  satis¬ 
fies 

D2  >  min  ( 82d, ) 
isi  si. 

where  dt,  -  ■  ■  ,dL  are  the  minimum  Hamming  distances  of 
the  L  block  codes  C„---,CL,  and  8; 1  is  the  minimum 
squared  Euclidean  distance  between  the  symbols  in  each 
subalphabel  of  the  ith  partition. 

Consider  now  Ginzburg’s  constructions  based  on  gener¬ 
alized  group  alphabets.  By  associating  with  each  level  the 
elements  of  a  fair  partition  (the  concept  of  a  fair  chain  can 
be  used  here),  all  the  subalphabets  at  a  given  level  have  the 
same  minimum  distance.  From  the  fair  partition  of  Al¬ 
phabet  2  described  before,  we  have  fi2  =  2/3,  and  8;  =  2. 
Thus  using  the  (N,k, 3)  Hamming  code  on  GF(4)  [9,  p. 
193,  194)  and  the  trivial  (N,N,  1)  code  on  GF(8),  with 
/V  (4'"  -  l)/3,  k  =  N  -  m,  m>  1,  we  have  D:  >  2.  The 
resulting  alphabet  has  a  rate 

R  -  1 5(4'"  -  l)-6m]/l4(4"  -  I)) 

and 

D2  log2  M  ^10  1 2m /(4m  1) 

For  example,  cluxtsing  m~  2  we  get  a  rate  R  =  I  05  and 
I)2  log,  M  ;»  8  4;  with  m  -  3  we  get  R  -  1  18  and 
D}  log,  M  >  9.4. 


lip,  4  <  nn/hui*  conslriir (tun 


SOS 

Using  Alphabet  4  and  the  partition  described,  we  have 
S;  =  2 r‘,  and  5,  ==  8c2.  The  (18, 15,4)  extended  Hamming 
code  [19,  p  36|  on  GE(16)  and  the  trivial  (18,18,1)  code 
on  GF(8)  can  be  employed,  providing  a  squared  minimum 
distance  l)2  >\  l\\  This  alphabet  yields  R  =  1.583  and 
Dl  log,  M  >  7.67. 

IV.  Mia  UDiMi.NstONAi-  Coiataa  Signaijs:  I  ki  i  i.is 
( U NG l  R HOI  CK)  C'ODIuS 

We  shall  now  see  how  an  Ungerboeck  axle  [10]  can  be 
designed  using  a  multidimensional  alphabet  generated  as 
described  in  Section  II.  Such  codes  can  be  specified  as  in 
[17]  Each  coded  symbol  depends  on  k  +  v  source  bits, 
namely,  the  block  r  =  (a,,  -  •  -,  ak)  of  k  bits  generated  by 
the  source,  plus  v  bits  preceding  this  block.  The  v  bits 
determine  one  of  the  N  =  2’  states  of  the  encoder,  say 
o  =  {ak ,  •  •,  uh  4 ,),  an  -  0,1.  The  encoder  slate  for  the 

next  coded  symbol  is  obtained  by  shifting  the  an  k  places 
to  the  right,  dropping  the  right-most  k  bits  and  inserting 
on  the  left  the  most  recent  source  bits.  The  encoded 
symbol  x  depends  on  r  and  a ;  we  write 

x  =  f(r,a)  (4.1) 

where  x  is  an  element  of  a  GGA.  This  encoding  procedure 
can  be  described  using  a  trellis  (Fig.  6  shows  a  section  of 
such  a  trellis,  obtained  for  v  =  2). 


Fig  6  Four-state  trellis  code  for  Alphatxrt  2. 

We  conjecture  that  a  good  code  should  show  a  good  deal 
of  symmetry  to  be  reflected  by  the  structure  of  the  func¬ 
tion  /  in  (4.1),  or.  equivalently,  by  the  assignment  of 
symbols  to  the  branches  connecting  any  pair  of  nixies  in 
the  Cixle  trellis  (for  further  details  sec,  e.g.,  [10],  [11])  Hits 
can  be  obtained  in  our  framework  by  assigning  to  the 
branches  assixriated  with  each  nixie  the  set  of  symbols 
obtained  from  a  fair  partition  of  a  GGA.  This  is  equivalent 
to  the  procedure  suggested  in  [10]  and  called  “mapping  bv 
set  partitioning1  :  thus  our  procedure  can  be  viewed  a>  a 
systematic  wav  to  achieve  set  partitioning 

I  he  most  widelv  used  single  parameter  that  specifies  the 
performance  of  these  cixles  on  the  additive  white  Gaussian 
noise  channel  is  the  free  Euclidean  distance.  This  can  be 
computed  using  either  a  generating  function  approach  or  a 
modified  bidirectional  search  algorithm  [ 20],  [21],  or  proce¬ 
dures  based  on  Viterbi  algorithm  and  described  in  f 2 3 j . 
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|24]  (see  also  [25,  pp  561  -  564|)  The  generating  function 
technique  consists  of  enumerating  all  possible  distances 
between  sequences  of  symbols  associated  with  t,aths  in  the 
trellis.  In  general  [11],  the  generating  function  can  be 
obtained  as  the  transfer  function  of  a  state  d;agram  re¬ 
garded  as  a  signal  flow  graph.  The  state  diagram  is  defined 
over  an  expanded  set  of  A/2  =  22'  states.  Eor  the  special 
case  of  a  trellis  based  upon  a  linear  binary  convolutional 
code  and  a  strongly  homogeneous  partition  of  a  GGA,  the 
minimum  distance  can  be  computed  from  a  generating 
function  obtained  as  the  transfer  function  of  a  stale  di¬ 
agram  including  only  N  —  2'  states.  (See  [15,  theorem  3]). 

We  shall  describe  two  examples  of  four-dimensional 
Ungerboeck  codes.  The  first  example  originates  from  Al¬ 
phabet  2.  It  has  minimum  distance  2a2  =  0.66.  Hie  fair 
partition  described  before  gives  four  subsets  of  eight  vec¬ 
tors  each,  with  minimum  intradislance  6a2  =  2.  By  choos¬ 
ing  a  four-state  trellis  code  with  the  structure  described  in 
Fig.  6,  we  get  a  squared  free  distance  6 a2  =  2.  If  this  figure 
is  compared  to  the  minimum  distance  achieved  by  using 
two  independent  4-PSfC  signals,  which  transmit  the  same 
amount  of  information  over  the  same  number  of  dimen¬ 
sions,  we  see  that  an  energy  saving  of  3  dB  is  obtained. 

Consider  now  Alphabet  4.  It  has  a  minimum  square 
distance  0.3.  The  fair  partition  described  gives  16  subal¬ 
phabets  of  eight  vectors  each,  with  minimum  intradistance 
1.2.  By  using  the  four-state  Ungerboeck  code  described  in 
Fig.  7,  the  squared  free  distance  obtained  is  <*L  =  l-2-  By 
comparing  this  to  the  minimum  distance  obtained  by  using 
two  independent  8/4-PSK  signals,  we  see  that  an  energy 
saving  of  about  4.3  dB  is  obtained. 


lift  7  tour-stale  irellis  code  for  Alphabet  4 

V.  CoNNirilONS  WITH  III  !  AIM)  WORK 

Recently,  Gaiderbank  and  Sloanc  [  22)  have  descnbed  a 
method  of  constructing  multidimensional  Irellis  codes 
where  the  alphabet  is  a  finite  subset  of  a  lattice  /,  with  an 
equal  number  of  points  from  each  coset  of  a  .sublattice  M 
of  /..  As  pointed  out  by  the  editor,  the  symmetry  and 
homogeneity  properties  of  these  alphabets  are  almost  iden¬ 
tical  to  those  of  GGA’s.  Suhah.  abet  edge  effects  are  the 
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reason  why  the  correspondence  is  not  quite  exact  To  see 
how  the  two  methods  are  related,  consider  t.e  partition  of 
the  integer  lattice  1.  =  Z2  generated  by  (1,0)  and  (0, 1)  into 
eight  cosets  of  the  sublattice  \t  generated  by  (2,2)  and 
(2,  2).  The  sublatiice  M  consists  of  all  vectors  with  norm 

divisible  by  X.  Let  Ta  h  be  the  translation  given  by  Ta  h: 
(  t,  r)  •  (  x  +  a,  y  +  b).  Let  A'  =  {(0,1)),  and  let  <7 
(7,’j  ,,  7’,  I,)  be  the  group  of  all  translations.  Define  a  chain 
of  subgroups  //,.//>.//,,  G\  by  //,  -  <7,  „  7',  ,),  II, 

(7'j.i),  7{,  ;),  7/-<  (7’2  ,,  7j  3).  This  chain  of  subgroups 

corresponds  to  the  eight-way  partition  of  l. 


VI,  Conclusion 

Ginzburg  (X)  described  a  method  of  dividing  a  signal 
alphabet  into  a  chain  of  subsets  via  the  action  of  a  group 
of  orthogonal  matrices.  We  generalize  this  approach  by 
introducing  generalized  group  alphabets,  and  we  consider 
the  combination  of  these  alphabets  with  block  or  trellis 
codes.  Some  actual  designs  show  that  consideration  of 
GGA’s  may  lead  to  transmission  systems  providing  good 
performance  with  band-limited  channels  at  the  price  of  a 
relatively  modest  complexity. 
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In  ill.;,  I,»l  f  .  timmnmr alum  \y\t  t»>.  A  tiiscrcU*  system  with  nicni- 
orv.  in-, <<»«•<!  hrinrrii  I  he  source  ami  the  modulator,  is  designed  with 
tin  i m  ot  p<  undine  an  npioalvnl  channel  with  a  distortionless  linear 
pai  l  and  no  nunlmcarities  up  to  a  given  order.  This  design  is  based  on 
;4  \  1 1 1 1  (  i  f  a  series  model  of  (lie  channel,  and  on  the  theory  of  />th-orcfer 
in>  r  r  se  s\  stems 

Ski,  ,•  tin-  numici  at  or  design  is  based  on  a  mathematical  model  of 
the  t  tunnel.  I  he  pndil.  m  <*f  modt !  .denlilication  is  considered.  A  mod- 
.(  ^Iimntte  is  dt.M  r  jlied.  based  on  computer  simulation  anti  appli* 
t.i!,  .1  of  orttiugon.il  \  o  1 1  e  i  r  a  series.  Several  examples  show  the  per- 
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S  THi  demand  lor  Hi-  spectrum  increases,  high¬ 
speed  .ia!:!  transtniss.im  over  radio  channels  is  likely 
tn  luMid'it  from  consideration  of  high-capacity  modulation 
formats,  such  as  multilevel  quadrature  amplitude  modu¬ 
lation  (QAM)  Their  application  has  been  slowed  down 
tn  the  pie-.. -nee  ot  aindhude  (AM/AM)  and  phase  (Aivl / 
1>M  i  ponlmearities  present  in  radio-frequenev  (R1-)  power 
amplifiers  driven  at  or  near  saturation  for  better  efli- 
cicncv  Ai.iuailv,  the  nonlinear  distortions  introduced  by 
these  amplifiers  make  the  standard  channel  model,  i.c., 
the  additive  white  Gaussian  noise  channel,  far  from  real¬ 
ists  and  hence  svoem  designs  based  on  it  far  from  op¬ 
timum. 

t  in  the  ad'litr.  e  n  mm  <  iaussiar.  noise  channel ,  it  is  w  el' 
ih.it  (,1AM  signals  with  a  rectangular  constellation 
i.je  a  better  bit -error  rate  (BHK)  performance  than 
1 ,5  ,  shilt  Te  me  '  PSK )  with  an  equal  number  of  points, 
ip  , ,-vei ,  tins  situ.s’to.i  seems  to  be  reversed  when  noti- 
hn-.ir  ili -I 'itt.'tis  am  present  in  the  channel.  To  cope  with 
r ( , ,  .  nmPT.mi.  the  miple-.t  appioach  is  to  hack  oil  from 
ti,.-  ..atm  it:,>n  poor  the  amplifier  characteristics,  in  or- 
i  . .  I,,  p;;.  ;  n  •  -  -  .d  .miphiede  11  net  nation  .  to  involve  a 
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verified  ,.,ce.  lor  example.  |20|)  that,  as  the  number  of 
energy  levels  in  the  signal  constellation  increases,  the 
TWT  working  point  should  he  backed  off  more  to  com¬ 
pensate  for  the  nonlinear  behavior  of  the  amplifier.  In  this 
situation,  il  may  occur  tiiat  the  beneficial  effect  of  the  in¬ 
crease  in  linearity  is  offset  by  the  corresponding  decrease 
of  amplifier’s  output  power.  As  a  result,  PSK  (which  has 
only  one  cncrgv  level)  may  perform  better  than  QAM 
(which  has  more)  |2()|.  This  is  why  more  sophisticated 
solutions  tire  called  tor. 

from  the  above  discussion,  it  appea  .  rattier  natural  to 
investigate  two  dimensional  signal  constellations  that 
outperform  PSK,  and  vet  do  not  suffer  excessive  degra¬ 
dations  due  to  channel  nonlinearities.  This  paper  is  de¬ 
voted  to  this  problem,  through  an  approach  that  combines 
the  choice  of  the  modulation  format  and  the  compensation 
of  channel  nonlinearities. 

The  channel  model  on  which  our  analysis  will  be  based 
is  time-discrete.  We  assume  for  simplicity  that  the  mod¬ 
ulated  signal  is  sent  through  a  nonlinear  system  with 
memory  before  being  uflcctcd  by  additive  white  Gaussian 
noise  at  its  output.  In  other  words,  the  discrete  channel 
consists  of  two  separate  parts:  a  noiseless  dcterminstic 
part,  and  a  noise  adder.  Traditionally,  there  arc  two  phi¬ 
losophies  intended  to  cope  with  the  problem  of  channel 
nonlinearities.  One  consists  of  accepting  the  channel  as 
is,  without  trying  to  do  anything  to  modify  its  behavior, 
and  to  design  the  receiver  so  as  to  minimize  the  joint  ef¬ 
fects  of  intersymbol  interference,  nonlinearities,  and 
noise.  The  most  effective  nonlinear  signal  processing 
technique  based  tin  this  approach  is  maximum  likelihood 
sequence  estimation  (MLS IT),  to  be  performed  by  the  Vi- 
terhi  algorithm  |12|.  |I4|.  |I7.  ch.  10].  Unfortunately, 
this  technique  requires  a  processing  complexity  which 
may  make  it  unsuitable  tor  implementation  at  vety  high 
data  rates  for  this  reason,  suboptimum  receiver  schemes 
am  alliaclive  among  them,  we  can  recall  nonlmeai  equal 
i/at  ion  schemes  |  A  j.  |  1 ,  ,  mill  meat  cam.  diets  |  i  |.  |  2  | .  op¬ 
timum  lineal  eqnali/cis  |ls|.  and  optimum  hncai  teeeiv- 
me  liln.-is  Ill'll  (i  nuisi  he  k-.-pi  m  mind.  Iiowevei,  dial  a 
liiml.imenlal  limn  to  il-e  pcilomt.uicc  ol  any  <-l  these  ic 
,  eiveis  (and.  mote  ecnci.ilK  .  ol  ,m\  conccivahle  ic 
,  rivet,  cither  itm-ai  or  nonlme.i:)  depends  on  the  mini 
mum  I  inel  nlean  distance  bet  v.  e.-n  tin-  '■i--n.ii-.  obsi'ivid  a 
dr,  iiiilpri!  ol  ill"  noiseless  lie.  ,  1 1  1 1 '  il  1 1 1  n  1 '! . ,  I  pelt  o,  ' 

,  I  urine  I  |  IS]  Sr  a  led  m  -a  oiif -,  llrr.  I  mui.  ill  on  is  tins'  b 1  t  !- 
l.i,|  lh.il  die  -lend  lob.-  e  ,-,l  by  me  weei'.er  r-  ■' 


Iccted  In  ho  iso  hence.  mix  attempt  to  compensate  I'm  the 
ch.mnol  distortion  by  introducing  a  soil  of  "inverse  dis 
toition"  will  enhance  the  noise.  l;or  this  reason,  it  ap¬ 
peals  logical  to  investigate  solutions  based  on  the  com¬ 
pensation  ol  the  nonlineaiity  before  noise  addition.  This 
piocedute  should  make  the  channel  look  as  similar  as  pos¬ 
sible  to  a  Gaussian  channel 

II  this  appioach  is  chosen,  there  are  several  factors  and 
constiamts  that  should  be  kept  m  mind.  One  of  them  is, 
of  coinse,  the  ultimate  pei loi malice  that  the  nonlinearity 
compensation  system  can  achieve.  The  second  cue  is  (lic¬ 
ense  of  design,  the  implementation  complexity,  and  the 
cost.  The  third  is  that  the  compensator  itself  may  expand 
the  signal  bandwidth,  m  spite  ol  the  fact  that  out-of-band 
emission  must  be  kept  uinlei  control  [7],  In  fact,  while  a 
predisioner  reduces  out-of-band  emission  after  the  ampli¬ 
fier.  it  may  increase  it  belore  the  amplifier.  This  can  be  a 
problem,  lot  example,  m  a  satellite  system  with  the  pre- 
dtstoiier  located  in  the  giound  station  to  compensate  for 
the  on  bo, nd  iionl meat its  l  inallv.  in  certain  cases  pro¬ 
vision  must  be  made  lor  adaptive  compensation:  in  fact, 
when  a  constellation  with  a  large  number  of  points  is  used 
by  the  modulator,  even  variations  in  amplifier  character¬ 
istics  caused  by  temperature  changes,  dc  power  varia¬ 
tions.  and  component  aging  can  degrade  the  system  per¬ 
formance  |T).  Both  analog  and  digital  predistorters  can  in 
principle  be  considered,  however,  besides  being  more 
complex  and  expensive,  and  less  flexible,  the  analog  pre- 
distorters  seem  to  perform  worse  than  their  digital  coun¬ 
terparts  [5],  |6).  Hence,  consistent  with  our  assumption 
of  a  discrete  channel  model .  we  shall  consider  digital  pre- 
distortion. 

In  this  paper  we  consider  digital  predistortion  of  a  chan¬ 
nel.  i.e  .  the  design  of  a  device  lo  be  inserted  al  the  trans¬ 
mitter  front  end  of  the  transmission  system  and  whose  aim 
is  to  compensate  for  the  unwanted  effects  of  the  nonlinear 
channel.  This  design  will  be  based  on  the  concept  of  pth- 
ordet  inverse  ol  a  nonlinear  system.  The  theory'  of />th- 
oidei  inverses  was  developed  by  Schei/.en  (see  (23 .  eh 
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uonlmeai Hies  Assume  (list  that  die  channel  has  no  mem 
oiy  (i.e..  no  bandwidth-limiting  components  exist)  and 
consider  the  effect  of  a  predistorter  placed  just  before  the 
nonlinear  channel.  hi  this  situation,  which  we  shall  refer 
to  as  ineiiiorvless  predislonion ,  the  compensator  acts  by 
skewing  the  signal  constellation  in  such  a  way  that,  when 
passed  through  the  nonlinear  device .  it  will  resume  the 
original  shape  (c.g.,  a  rectangular  16  GAM  structure).  In 
other  words,  the  compensator  task  is  to  invert  the  discrete 
transmission  channel.  This  operation  does  not  modify  the 
spectrum,  and  hence  the  bandwidth  occupancy,  of  the 
transmitted  signal,  hut  ol  course  its  effectiveness  is  criti¬ 
cally  dependent  on  the  assumption  that  the  channel  has  no 
memory. 

li.  I’redisiortion  with  Memory.  I'he  plh-Order  Inverse 

Consider  instead,  the  more  realistic  assumption  of  a 
channel  with  memory.  In  this  situation,  the  compensator 
is  faced  with  a  far  more  dillicult  task,  the  inversion  of  a 
nonlinear  system  with  memory.  Now,  not  all  nonlinear 
systems  possess  an  inverse.  Also,  many  systems  can  be 
inverted  only  for  a  restricted  range  of  input  amplitudes 
|23,  p.  123  If.).  However,  it  is  always  possible  to  define 
a  pth-order  inverse,  for  which  the  input  amplitude  range 
is  not  restricted  |23,  ch.  7). 

Our  use  of  the  pth-ordcr  inverse  theory  will  be  based 
on  a  Volterra-scries  model  of  the  discrete  nonlinear  chan¬ 
nel  (see  1 17]  and  the  references  therein).  This  model  pro¬ 
vides  an  exceedingly  general  characterization  of  nonlinear 
systems  with  memory  based  on  the  so-called  Volterra  ker¬ 
nels.  a  set  of  parameters  which  can  be  thought  of  as  the 
extension  of  the  nonlinear  case  of  the  concept  of  impulse 
response  of  a  linear  channel.  Given  a  nonlinear  system  H. 
its  /Hh-ordcr  inverse  is  one  that,  when  cascaded  to  H.  re¬ 
sults  in  a  system  in  which  the  first-order  Volterra  kernel 
is  a  unit  impulse,  and  the  second  through  the  pth-ordcr 
Volterra  kernels  are  zero  [23).  In  other  words,  if  the  /Mil- 
order  nonlinear  inverse  channel  is  synthesized  at  the 
transmitter's  front  end.  the  compensated  transmission 
channel  will  exhibit  no  linear  distortion,  and  no  nonlinear 
distortion  up  to  order  p  Obviously,  the  performance  of 
the  /nil-order  comp  .!>.■>•  !  channel  will  depend  on  the  ef¬ 
fect  ol  the  tesidual  distortions. 

('.  Mt’iihii  x  /css  /’/  eih  slot  linn  l  ers/n  I’rcdtsloilior,  with 
Menu  <  i  v 

Before  proceedin'.*,  tmther  with  an  analvtic.il  description 
o  IT  he  com  pen  sat  ion  I  rased  on  />th  ordet  channel  in\  ersion . 
it  is  convenient  to  slop  the  discussion  lot  a  while,  and 
piovule  an  mleipictation  ol  the  two  types  ol  piedistoitei  s 
desenbed  in  the  ptevious  subsection  Mcmorylesv  prcdis 
(oitnui  is  l ho  operation  o!  chancin'.’  the  (source)  sMttbok 
(i„  into  t  lie  (channel )  s\  mhols  />,  i;  l  o  „  I .  w  lie  i  e  c  (  )  is 

a  suilahlc  cumplcv  turn  lion  ll  these  modified  svmbols 
an'  viewed  as  a  new  smiial  set  entcime  the  channel  land 
matched  lo  it),  sse  mas  think  oi  the  compensator  a-,  being 
mcoi  pointed  m  llic  inoilulaloi  In  coik  lusinn  I  lie  design 
ol  a  picdt-.ionci  tm  a  i  banncl  without  memoix  is  orpin 
al'-nt  In  the  ilcsu'u  nl  a  ness  modulation  si  licmc 
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Consider  then  a  predistorter  with  inciuory.  Its  operation 
consists  of  transforming  the  source  symbols  a„  into  chan¬ 
nel  symbols  f>„  whose  values  depend  not  only  on  an  but 
also  on  1.  previous  symbols.  Thus. 

I>„  ~  i.  '  •  •  (2.1) 

If  we  define  the  siatr  of  the  compensator  at  time  tt,  and 


we  denote  it  hy  o„ 

o„  =  ( a„  ■!.■■■  •  ) 

(2.2) 

w c  can  also  write 

h„  ---  !'(«„,  a„) 

(2.3) 

Consider  now  two  bandpass  nonlinear  systems.  I.et  the 
first  (the  compensator)  be  characterized  by  Voltcrra  ker¬ 
nels  /,  the  second  (the  channel)  by  Voltcrra  kernels  g .  and 
denote  by  h  the  kernels  of  the  system  resulting  from  the 
cascade  of  the  two.  The  first-,  third-,  and  fifth-order  h- 
kernels  are  explicitly  given  by 


«S  h.  rJ  r.  n 


(3.5) 


'n.n.f'.t  h  n.rj  f 


II)  r<  3  > 


,  13)  /-II  f  (I) 

‘  U  n.r.w.:J  r.itj  *.I>J  .-./ 


I  I  )  /-ID  r(  I  )* 


( 3  6) 


and 


which  shows  explicitly  the  dependence  of  the  channel 
symbols  bn  on  the  state  of  the  compensator.  This  “sliding 
block"  representation  of  the  compensator  operation  shows 
that  the  compensator  itself  is  equivalent  to  a  trellis  en¬ 
coder.  (This  equivalence  was  first  proved  by  Caldcrbank 
and  Mazo  (24].)  In  conclusion,  we  can  think  of  the  design 
of  a  predistorter  with  memory  as  of  the  choice  of  a  trellis 
code,  made ’in  order  to  compensate  for  the  channel  non¬ 
linearity. 


III.  Compensation  Based  on  pth-Order  Channel 
Inversion 

We  start  our  discussion  by  considering  the  cascade  of 
two  nonlinear  systems  (for  motivation’s  sake,  the  reader 
can  view  one  of  the  two  systems  as  the  compensator,  and 
the  other  one  as  the  channel  to  be  compensated).  We  shall 
base  our  treatment  of  the  subject  on  Volterra  series  rep¬ 
resentations  of  bandpass  systems  f  1 7,  ch.  10],  and  we 
shall  use,  for  notationa!  simplicity,  tensor  notations,  as 
suggested  in  [22].  These  notations  imply  that  any  sub¬ 
script  occurring  twice  in  the  same  term  is  to  be  summed 
over  the  appropriate  range  of  discrete  time.  Thus,  for  ex¬ 
ample,  we  write  .r,  y,  instead  of  .v,  y,  +  .r,  y2  +  •  •  •  . 


A.  Cascading  Bandpass  Nonlinear  Systems 

Subject  to  certain  regularity  conditions,  a  bandpass 
nonlinear  system  can  be  described  by  the  input-output  re¬ 
lationship 


cr  On  no  oo 


If'",, 


l  ,,  L,  I  ' 


,  ,  N  1  *  *  , 

/(„  „  ,,  ,  „  ./  „  ,  l„  l,,  I,  t  'i  l,.  ( 

hfC,  *  ,i„ 


=  «<'»  f <5> 

( >  n.  vJ  r.a.h.c.if.e 

-I-  P<J>  ft  1 1  1 1  fOI* 

^  o  n.  r.  h  .  r:aJ  n-.hj  z:  c, d.  r 

,  (3)  ft  I)  /-(')  p  II* 

^  iS  n:  r,  \  I  ,:J  r.aJ  w;  h.c.dJ  :.  <• 

+  (3.7) 

It  can  be  observed  that  (3.5)  expresses  a  relationship 
between  first-order  kernels  which  is  nothing  but  the  dis¬ 
crete  convolution  of  impulse  responses  of  linear  systems. 


B.  Volterra  Coefficients  of  prh-Order  Compensator 

Consider  now  pth-ordcr  compensation.  Under  the  as¬ 
sumption  that  the  linear  part  of  system  /,  i.e.,  the  linear 
functional  determined  by  the  first-order  kernel  of /,  is  in¬ 
vertible,  it  is  possible  to  find  a  system  g  such  that  its  cas¬ 
cade  with  / gives  a  system  with  no  linear  distortion,  i.e.. 


(i)  ftu  _  ft  i ) 
n.  rJ  c,  u  J  t 


o<" 

•  *S  r:<t 


(  =  1  n  =  a 
-  0  elsewhere. 


(3.8) 


This  choice  provides  the  first-order  compensator  Equa¬ 
tion  (3.8)  expresses  nothing  but  the  Nyquist  criterion  for 
(lie  absence  of  intersymbol  interference  in  the  overall 
channel.  In  appearance,  this  sounds  like  a  rather  pleasant 
result,  as  it  shows  that  even  when  dealing  with  a  nonlinear 
channel  die  linear  part  must  be  designed  (at  leas;.  ,1  the 
“pth-ordcr  criterion"  is  accepted)  to  be  Nyquist's.  In  the 
following,  we  shall  see  how  the  concept  of  “linear  part 
of  a  channel"  must  he  correctly  interpreted. 

The  third  order  compensator  is  obtained  by  choosing 
if**1  so  as  to  have  h1  1  "  0;  by  taking  the  discrete  con¬ 
volution  of  both  sides  ol  (3.6)  with  gi  1  [g1 1  lg<  1  * *,  and  re¬ 
calling  (3  8).  we  get 


•  iC:.  •  .  t,‘  '  *  t  -  ■  -  .  (34. 

I  n <m  i  t  4  i,  it  is  seen  that  the  system  is  characterized  h\ 
the  1  niter  i  a  i  erne  Is  Ilf  f  U„  ‘  •  Notice  that  one. 

odd  nni‘M  polynomials  appear  litis  is  due  to  the  baiidp.r 
natuie  o|  the  in uilmeaiHs 


.V 


.c 


( I )  (1)4 

g,  l-S:., 


(3.9) 


T  he  fifth  onlei  compensator  is  obtained  by  choosing  !i  M 
mi  as  to  have  li  1  0.  by  taking  the  discrete  con'.olulion 

of  both  sides  ol  (  3  7 1  with  g 1 1  'g' 1  'g 1  1  ‘g ' 1 1  *g 1 '  '*  .  and  re 
calling  (  1  8).  we  get  the  required  g1  1 


Before  going  timber,  let  us  consider  a  special  situation 
(which  is  admittedly  rather  simplistic,  but  gives  rise  to 
considerations  that  might  be  interesting).  Assume  that  the 
channel  nonlinearity  is  the  cascade  of  a  linear  system  L 
anil  a  memory  less  device  I).  The  //(border  compensator 
for  this  channel  can  be  easily  computed,  providing  a  re¬ 
sult  which  matches  intuition.  In  fact,  it  is  the  cascade  of 
a  linear  filter,  the  inverse  of  I,  (say,  I.  '),  preceded  by 
a  nonlinear  memoryless  device,  the  pth -order  inverse  of 
D.  Notice  that  me  cascade  /,  1  and  L  gives  rise  to  a  Ny- 
quist  filter.  This  result  shows  that  one  way  to  compensate 
for  the  channel  nonlinearity  in  this  case  consists  of  re¬ 
moving  the  channel  memory  and  compensating  for  the  re¬ 
sulting  memorylcss  nonlinearity  by  mcmoryless  predis- 
tortion. 

A  more  icalistic  model,  suitable  as  an  approximation  to 
a  number  of  single-channel  digital  satellite  communica¬ 
tion  systems,  assumes  that  the  linear  part  of  the  channel 
has  already  been  compensated  by  a  suitable  combination 
of  channel  filtering  and  linear  equalization  at  the  receiv¬ 
er's  front  end.  In  this  situation  some  simplifications  arise. 
In  particular  we  get,  for  the  first-  and  third-order  compen¬ 
sators 

„ii)  t 
«S  n.  a  ’  Ofi.a 

ZZ.t>.r  ~/Z Ur-  (3.10) 

C.  The  Effect  of  Compensation  on  Power  Spectrum 
We  consider  now  the  effect  of  a  //th-order  compensator 
on  the  signal  powei  density  spectrum.  The  continuous¬ 
time  signal  at  the  modulator  output  can  be  given  the  form 

cn 

x(t)  =  Z  b„s{t  -  nT)  (3.11) 

r  -  -  oo 

where  (b„)  is  the  sequence  of  channel  symbols,  T  is  the 
symbol  period  (equivalently,  T-1  is  the  baud  rate),  and 
s(t)  is  the  basic  waveform  used  by  the  modulator.  The 
power  densitv  spectrum  of  signal  (3.1 1)  is  given  by  (see 
(17.  P.  33])  ' 

G  (  f )  “  I.V(/)j  j”  Z  ^  13, ,  exp  (  —jlirfnT )  j 

(3.121 

where  //,  is  the  autocorrelation  of  the  symbol  sequence  at 
the  compensator  output  and  S(  f)  is  the  Fourier  transform 
of  sit)  It  is  easily  recognized  that  the  brackets  on  the 
RUN  of  (3.12)  contain  the  discrete  Fourier  transform  of 
the  sequence  i  >i„ ),  i  o.,  (lie  power  density  spectrum  of  the 
sequence  at  the  compensator  output.  This  is  a  periodic 
function  ot  /  uith  period  1/7' 

From  (  V  12  i.  we  see  that  the  spectrum  shaping  effect  o; 
the  compensator  can  Ire  analyzed  hv  evaluating  the  auto¬ 
correlation  sequence  ill,,)  For  example,  a  linear  compen¬ 
sator  responding  to  the  source  symbol  sequence  ( c 
/  if/,,!'  I  .  with  the  sequence  ( <t„  t  Aa„  .  , ).  A  a  real 
const; ml .  xs  ill  cause  a  spectral  shaping  (  i  r  A'  )  *  2.-1  on 


( 2 s //  ).  A  (act  that  might  he  unexpected  a  priori  is  mat 
the  nonlinear  terms  of  (he  compensator  may  be  irrelevant 
in  shaping  the  spectrum.  Consider,  as  an  example,  the 
compensator  output  a„  +  Aan-\  -t  Ba„an  _  2.  By  di¬ 

rect  calculation,  it  can  be  seen  that  (3n  =  (  \  +  A2  -t  B: ) 
and  (3^,  ~  -A,  while  fl,  =  0  for  j  i |  >  2.  Hence,  the 

third-order  nonlinearity  has,  for  B 2  (as  is  the  case 

when  relatively  mild  nonlinearitics  must  be  compen¬ 
sated),  very  little  effect. 

1).  Computing  the  Linear  Part  of  the  Compensator 

The  computation  of  the  linear  part  of  the  compensator, 
i.e.,  of  the  kernels  g‘"  that  solve  (3.8),  deserves  some 
further  attention  By  rewriting  explicitly  (3.8),  we  have 

z/zyz  =  it.o  on) 

n 

where  d  denotes  the  Kroncckcr  symbol.  Since  we  are  in¬ 
terested  in  a  finite-complexity  compensator,  we  consider 
a  (perhaps  appioximatc)  solution  of  (3. 1 3)  which  includes 
just  a  finite  number  of  terms  in  the  summation.  Thus,  our 
problem  is  equivalent  to  the  design  of  a  zero-forcing 
equalizer  of  finite  length.  Two  technical  assumptions  are 
necessary  here,  namely,  that  there  exists  only  a  finite 
number  of  nonzero /‘"-kernels,  and  that  the  polynomial 
whose  coefficients  are  these  kernels  has  no  root  with  unit 
magnitude.  Under  these  conditions,  a  solution  exists  for 
the  kernels  g‘"  with  values  that  decrease  in  magnitude 
away  from  a  “center  kernel.’’  The  procedure  for  com¬ 
puting  these  kernels,  which  requires  finding  the  roots  of 
a  polynomial  and  the  solution  of  a  set  of  linear  equations, 
can  be  found  in  (27], 

IV.  Tub  Rolb  of  Channel  Modeling— Orthogonal 

Voltf.rra  Series 

From  our  preceding  discussion  it  is  seen  that  the  com¬ 
pensator  design  is  based  on  a  Volterra-series  model  for 
the  nonlinear  transmission  channel.  Thus,  the  availability 
of  such  a  model  is  crucial  As,  apart  from  some  very  sim¬ 
ple  cases,  analytical  evaluation  of  Volterra  coefficients  is 
not  feasible,  computational  techniques  should  be  used. 
Basically,  two  approaches  are  available,  which  we  shall 
refer  to  as  “block  modeling”  and  “global  identifica¬ 
tion.’  ' 

Consider  first  block  modeling.  It  is  based  on  a  model 
of  the  channel  as  a  cascade  of  linear,  time-invariant  tillers 
and  bandpass  nonlinear  devices  whose  input-output  rela¬ 
tionships  are  given  in  the  form  of  a  Taylor  series.  In  this 
ease,  the  Volterra  kernels  are  evaluated  by  combining  the 
input -output  relationships  of  the  building  blocks  that  form 
the  channel  (see,  for  example,  |I9]).  Although  this  up 
proach  is  apparently  simple  and  straightforward,  particu¬ 
larly  when  the  channel  itself  is  composed  of  a  reduced 
number  of  blocks,  in  its  application  some  care  must  he 
exercised  by  taking  two  impoitant  points  into  considera¬ 
tion.  First  of  all.  in  many  eases  the  number  of  nonzero 
Volterra  cocllioicnts  is  so  large  that  the  number  of  com 
pul, it  ions  involved  m  evaluating  higher  order  coefficients 


mas  be  i..i,  ..ictical.  The  second  one  is  more  subtle.  Per¬ 
haps  the  most  important  f net  to  be  kept  in  mind  when  con¬ 
sidering  tfie  identification  of  (he  channel  is  that  nonlinear 
systems  behave  differently  for  different  input  signals.  To 
understand  the  consequences  of  this  statement,  consider  a 
simple  example.  Assume  we  are  dealing  with  a  channel 
responding  to  the  input  sequence  .rn  with  the  sequence  y„ 
=  n.x„  +  ft  v,',  and  assume  that  x„  can  take  only  the  values 
-t  |  or  I  Under  these  conditions,  as  ,i„  =  x;’,  the  system 
behaves  as  linear,  with  input-output  relationship  y„  =  (a 
+  ft)x„.  On  the  other  hand,  if  the  input  sequence  can  take 
values  —  3,  -  I,  1,  and  3,  the  system  really  behaves  as 
nonlinear.  Hence,  we  realize  that  in  a  Volterra-serics 
model  each  one  of  the  nonlinear  terms  affects  the  trans¬ 
mitted  sequence  differently  if  different  modulation  for¬ 
mats  are  used.  As  an  example,  the  third-order  Volterra 
kernel  hjy] ]  has  a  different  behavior  on  PSK  and  QAM  sig¬ 
nals.  In  fact,  this  kernel  multiplies  a  term 

M>„ -(&»-!  • 

For  PSK  b„ .  |  b*_  ,  =  |  /j>„  -  i  j 2  =  constant,  and  hence  the 
kernel  contributes  to  linear  distortion  only.  As  a  conclu¬ 
sion,  the  Volterra  kernels  should  be  rearranged,  after 
computation,  to  account  for  effects  like  this.  Besides  op¬ 
erating  by  inspection,  a  general  way  to  reduce  the  Vol- 
terra  coefficients  in  order  to  account  for  the  modulation 
format  at  hand  is  based  on  an  orthogonal  Volterra  series. 
We  shall  consider  this  point  further  on. 

Consider  then  global  identification.  This  is  entirely 
based  on  computer  simulation,  and  consists  of  identifying 
the  Volterra  kernels  of  the  transmission  system  (already 
in  their  reduced  version)  through  a  gradient  algorithm 
(see,  (17,  ch.  10]  for  further  details  about  the  identifica¬ 
tion  algorithm).  Using  global  identification,  the  reduction 
problem  mentioned  above  can  be  solved  at  once  by  using 
what  we  call  an  orthogonal  Volterra  series,  a  type  of  ex¬ 
pansion  that  depends  on  the  channel  input  characteristics 
and  does  not  need  any  further  reduction. 

A.  Underlying  Theory 

The  Volterra  expansion  (3.4)  has  the  structure  of  a  Tay¬ 
lor  series,  and  as  such  shares  with  the  Taylor  series  some 
negative  features.  For  example,  it  might  be  inadequate  to 
represent  highly  nonlinear  systems,  or.  equivalently,  non¬ 
linear  systems  with  large  outputs.  Moreover,  the  Volterra 
model  of  a  civen  channel  may  not  be  improved  by  adding 
more  terms  to  the  senes.  Finally,  even  when  the  channel 
input  sequence  arc  independent  random  variables,  the 
terms  in  (3.4)  are  not  even  uncorrelated.  Now,  many  of 
the  tfrawbU'.ds  of  ill  -  Voliena  series  can  be  fixed  up  by 
usme  oithognnul  polynomial  expansions.  These  consist  o', 
using,  an  input  output  relationship  of  the  typo 

\ h\  ".<>  1  I  i,. )  *  h'„ ,  O'  ''(  v  ,.  v,„  y  I  t  •  ■  ■ 

(4  1  i 

whole  O  denotes  a  polynomial  ol  degieo  /  that  is  or 
( hoe  on.  1 1  well  respect  !<  i  the  seq  lienee  ol  landom  variable- 


(  i . ).  More  precisely,  the  expectation  /:( ()“  '(?*(  J  )  J  is 
equal  to  zero  if  i  i  j,  or  it  i  —  j  but  the  arguments  of 
(/'*  and  (?'J)  are  not  a  permutation  of  each  other.  If  it  is 
assumed  that  die  sequence  (  v„ )  is  a  stationary  sequence 
of  independent,  identically  distributed  random  variables, 
the  construction  of  these  orthogonal  polynomials  is  a  rel¬ 
atively  straightforward  task.  In  fact,  the  resulting  poly¬ 
nomials  turn  out  to  he  a  generalization  of  multidimen¬ 
sional  Hcrmilc  polynomials,  as  defined  by  Grad  (25], 
They  can  be  constructed,  by  using  an  observation  of  Za- 
deh  |26],  according  to  the  following  rule: 

Q'-’Qn.hz,  •  ■  ■  ..»«,)  =  •  -  ■  /U-O  (4.2) 

where  -  -  -  ,  n,  denote  the  number  of  indexes  of  the 
arguments  of  Qu)  equal  to  lct,  -  •  •  ,  k, ,  respectively,  and 
(  •  )  are  polynomials  in  a  single  indeterminate  orthog¬ 
onal  with  respect  to  the  random  variable  x„,  i.c., 

E{P,{x„)  />(*„)]  -  0  for  /  *j 

where  l'\  •  ]  denotes  expectation  with  respect  to  x„.  For 
example, 

{?(3,(->t. -u.. u)  =  PzMPiM- 


Consider  then  the  problem  of  constructing  the  polyno¬ 
mials  P(  ■ ).  They  can  be  found  using  a  procedure  based 
on  the  selection  of  a  sequence  of  linearly  independent 
monomials  in  the  variable  x„,  say  /0,  /, ,  ■  •  •  .  Explicit 
formulas  arc  (see  also  [22,  p.  608(1]) 


M-O 

k  /*-.  •••/<> 

FMt-xfA  £(ik->i:]  •••  pi/*- t/o! 

=  det 

P\f*k\  P\ [/oVi-.l  ■  ■  ■  E[| /o I3] 

B.  Application  to  Digital  Radio  Modulation  Systems 


In  our  situation,  we  can  start  from  the  sequence  of 
monomials 


This  sequence  must  be  reduced  by  taking  into  account 
the  particular  type  of  modulation  scheme  involved,  which 
may  render  some  of  the  monomials  linearly  dependent 
For  example,  with  unit-energy  PSK  we  have  |.t„|'  =  I. 
and  consequently  the  fourth,  the  eighth,  and  the  ninth 
monomials  above  must  he  deleted  from  the  list  Further¬ 
more,  for  four-phase  PSK  we  have  x„  -  ±  a*,  which 
causes  the  seventh  and  the  tenth  monomial  to  he  deleted, 
too. 

Finally,  the  polynomials  Q"'  associated  with  the  par¬ 
ticular  modulation  scheme  can  he  constructed  as  follows. 
Use  lust  rule  (4  2),  then  delete  the  polsnomials  which 
correspond  to  the  components  ol  the  channel  output  lall 
mg  outside  of  the  bandwidth  of  interest  (see  |  17,  pp  542 
l!  |  (or  tuitliei  details).  In  piacticc.  this  corresponds  (o 
keeping  only  (he  tei  ms  ol  (he  is  pc  > ,  a  s  i  * .  a  i  s  s  *  a  4 . 


cic.  !\>i  cx;ini|U,  (lie  Q  polynomials  lor  unit-energy  PSK 
are,  up  to  order  three 

*  :  * 

.  >,Q<<  -  ■ 

Similarly,  for  unit-energy  16  QAM  we  have 


V.  Somi:  I;xami‘U-s  or  Appi.k  a  t  ion 

We  shall  now  consider  some  examples  of  applications 
of  tltc  concepts  outlined  in  previous  sections.  bxamina- 
tion  of  a  few  simple  situations  will  allow  us  to  show  the 
applicability  of  this  theory,  and  will  hopefully  enhance  its 
comprehension. 

We  deal  with  a  nonlinear  channel  modeled  using  a 
bandpass  orthogonal  Voltcrra  series  whose  coefficients  for 
PSK  signaling  are  given  in  1 17,  p.  566],  This  channel  re¬ 
sults  from  the  cascade  of  a  rectangular  shaping  filter,  a 
fourth-order  Butterworth  filler  with  3  <1B  bandwidth  1.7/7' 
( 7'  is  the  signaling  period),  a  typical  TWT  amplifier  ex¬ 
hibiting  both  AM/AM  and  AM/PM  conversion,  and  a 
second-order  Butterworth  filter  with  3  dB  bandwidth 
I .  I  JT.  The  amplifier  is  driven  at  saturation  when  the  se¬ 
quence  at  the  input  of  the  discrete  channel  has  magnitude 
' .  (See  [  17,  ch.  10),  for  more  details  about  this  channel.) 
\/c  proceed  to  compensate  for  this  channel  by  inserting 
ir:  front  of  it  a  nonlinear  device  with  memory  obtained  as 
an  approximation  of  the  channel  inverse.  In  particular,  we 
denote  by  ( r, ,  r,,  ■  •  •  ,  rp)  the  compensator  obtained  by 
retaining  in  it  only  r,  first-order  Voltcrra  coefficients,  r, 
third-order  coefficients,  etc.  Thus,  for  example,  (3,  i) 
indicates  a  third-order  compensator  with  three  first-order 
and  one  third-order  coefficients.  The  coefficients  are  cho¬ 
sen  whose  indexes  tire  the  same  as  the  Voltcrra  coeffi¬ 
cients  of  the  channel  having  the  largest  magnitudes.  Our 
computational  experience  has  shown  this  choice  to  be  the 
most  effective,  although  no  formal  proof  of  its  optimality 
has  been  obtained  yet. 

Consider  lirst  transmitting  ;m  8  PSK  symbol  sequence 
driving  the  amplifier  tit  saturation.  The  reduced  Volterra 
kernels  tire  listed  in  big  I.  The  symbols  are  exp  (  j 0). 
exp  (  j~  '4).  •  •  •  .  exp  (  jlx/4).  Without  any  compen¬ 
sation.  the  samples  ol  the  received  signal  form  the  eon- 
x'ellatmn  shown  in  big  2.  where  only  the  first  quadrant 
i,  shown  tor  sake  ol  clarity.  It  ti  (  I.  I  )  compensator  is 
used,  the  corresponding  constellation  looks  like  big.  3. 
The  reduction  in  the  constellation  spread  is  apparent.  No¬ 
tice  alsci  the  phase  rotation  introduced,  which  compen¬ 
sates  lot  the  tot.ition  caused  by  the  amplifier's  AM/PM 
A  (4.  I  i  compensator  gives  the  result  shown  in  big  4. 
while  ih  ■  elici  t  ol  a  f  t.  3)  compensator  is  depicted  in 
big  S 

fur  Ui  QAM  signals  with  the  Inchest  energy  level  die. 
mg,  die  amplifier  at  salination,  the  channel  quality  without 
eompeiis.iimn  is  even  less  sahsfai  lory  big  0  shows  die 
received  ■  onslell.ition  in  the  lust  quadrant  it  is  seen  dial 
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Fig.  I .  A  sot  of  Voltcrra  kernels  for  a  PSK  channel 
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2.  Sii*ii.il  .it  I  ho  output  of  the  channel  ot  l;t  I  when  S 

PSK  is  used 

two  clusters  overlap  The  effect  of  a  (  I .  I  )  compensator 
is  shown  in  big.  7,  while  l;ig.  8  depicts  the  effect  of  a  (4, 
I  )  compensator  Similar  results  have  been  obtained  lor  16 
PSK:  see  big  6  (uncompensated  channel),  big  10  |(  I, 
I  )  compciisntoi  |.  I'ig  II  |(4,  1  )  compensator),  and  big. 
12  |(4.  s  )  compensator! 

boi  all  these  situations,  the  effect  of  the  compensator 
on  the  power  density  spectrum  was  evaluated,  and  found 
to  lie  politically  irrelevant:  actually,  the  difference  be¬ 
tween  the  power  spectra  ol  uncompensated  and  compel! 
sated  signals  never  exceeded  a  traction  ol  a  deed'd. 
Consider  then  the  case  ol  a  channel  whose  linear  pad 
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Fig.  9.  Signal  constellation  a*  the  output  of  the  channel  of  Fig.  I  when  1ft 
PSK  is  used. 
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Fig.  10.  Same  in  Fig  9.  with  a  (  I,  1  )  compensator. 
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F'ig.  12.  Same  as  in  Fig.  9.  with  a  (4.  5  )  compensator. 

has  been  designed  to  satisfy  the  Nyquist  criterion  for  no 
intersymbol  interference.  Specifically,  assume  the  trans¬ 
mitter  and  receiver  filters  to  have  the  common  shape  of  a 
square-root  raised  cosine,  with  a  rollolf  factor  0.5.  The 
channel  between  them  is  modeled  through  a  nonlinear  am¬ 
plifier  exhibiting  AM/AM  and  AM/PM  conversion  ef¬ 
fects,  driven  at  saturation,  and  whose  input-output  char¬ 
acteristics  arc  described  using  a  model  due  to  Saleh  (see 
(28,  eqs.  ( 1 )— (5) ))  with  parameters 

a„  =1.9638  a#  =  2.5293 

da  =  0.9945  =  2.8168. 

Block  identification  of  this  channel  turns  out  to  provide 
rather  disappointing  results.  For  example,  we  get  a  center 
linear  kernel  whose  value  is  /to"  =  1.97  +  y0.08,  which 
fails  to  account  for  the  rotation  ( about  40°  )  introduced  by 
the  amplifier  at  its  saturation  point.  We  need  to  reduce  the 
Volterra  expansion  obtained  by  block  identification,  or, 
even  better,  to  use  global  identification  and  orthogonal 
polynomials.  This  operation  provides  the  coefficients  for 
the  orthogonal  Volterra  scries.  The  largest  among  them 
are  listed,  up  to  order  three,  in  Fig.  13.  It  can  be  seen  that 
the  central  linear  coefficient  rellects  the  phase  rotation 
caused  by  the  nonlinear  amplifier.  Figs.  14  and  15  pro¬ 
vide  a  comparison  among  the  scattering  diagrams  of  8  PSK 
and  16  PSK.  respectively,  at  the  output  of  a  channel  with 
(1.0)  compensation  (i.e.,  compensated  only  for  the  ro¬ 
tation  and  the  amplitude  scaling)  and  (3.  6)  compensa¬ 
tion.  Inspection  ol  those  scattering  diagrams  shows  that 
the  effect  of  the  third-order  compensator,  although  evi¬ 
dent.  is  less  dramatic  than  tor  the  cases  considered  pre¬ 
viously 

V  I  (  '<  >\<  1  I  ISIl  INS 

We  have  considered  the  design  ol  digital  compensators 
loi  nonlmcai  channels  Out  design  is  based  on  the  theory 
ol  /Uli  oidei  invcise  o!  iionlincai U ics.  and  on  a  computer- 
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Fig.  13.  A  sc(  of  orthogonal  Voltcrra-scrics  coefiicicms. 


(a)  (b) 

Fig.  14.  Signal  constellations  at  the  output  of  the  channel  modeled  by  the 
coefficients  of  Fig  13  for  8  PSK.  (a)  With  ( 1 . 0)  compensation,  (b)  With 
(3.6)  compensation 
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(a)  (b) 

Fig  15.  Signal  constellations  at  the  output  til  the  channel  modeled  by  the 
coefficients  of  f  ig  13  for  lb  PSK  fa)  With  (1.0)  compensation,  (b) 
With  (  3.  6)  compensation. 


aided  analysis  of  the  system  to  he  compensated.  A  num¬ 
ber  of  examples  was  worked  out  to  show  the  applicability 
of  this  approach.  In  principle,  it  is  possible  to  compensate 
a  given  eh.imvl  to  any  desired  degree  of  accuracy.  How¬ 
ever,  obvious  complexity  limitations  make  the  approach 
presented  here  more  useful  when  the  uncompensated 
channel  is  strongly  nonlinear  In  fact,  if  only  a  third -ordei 
compensator  is  allowed,  our  results  show  that  it  will  work 
better  with  a  channel  with  a  few  strong  low-order  nonlm 
enrities  than  with  a  channel  which  has  small  nonlinear 
Volterra  coefficients,  but  many  of  them  are  of  a  higher 


order.  In  (he  latter  case,  a  certain  amount  of  power  back¬ 
off  may  prove  more  beneficial  than  a  nonlinear  compen¬ 
sation.  (Notice  that  the  backoff  can  be  included  in  our 
model  by  simply  multiplying  the  right-hand  side  of  (3. 13) 
by  a  factor  smaller  than  one.) 
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GROUP  COOKS  AND  SIGNAL  OKS  1  ON 
FOR  DIGITAL  TRANSMISSION 
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I  -  INTRODUCTION 

Symmetry  seems  to  be  a  feature  intrinsic  to  every  life  process.  It 
should  be  a  very  stimulating  undertaking  to  discuss  the  fundamental 
role  played  by  symmetry  in  art,  music,  chemistry,  biology,  physics, 
computer  science  and  more  generally  in  every  mathematical  science.  A 
fascinating  sample  of  this  subject  was  provided  by  H.  Weyl  [53]  in  his 
last  book  dedicated  to  a  synthetic  view  of  symmetry.  Nevertheless  in 
this  paper  we  limit  our  considerations  to  the  key  role  of  symmetry  in 
communication  theory.  In  this  field  symmetry  plays  an  indispensable 
part  in  reducing  the  complexity  of  every  data  transmission  scheme. 

The  algebraic  notion  of  group  underlies  both  the  geometrical  descrip¬ 
tion  of  digital  signals  proposed  by  Shannon,  [A3],  and  the  geometrical 
methods  of  error  control  codes  developed  shortly  after  Shannon's  work. 
However  the  introduction  and  systematic  use  of  methodology,  machinery 
and  language  of  group  theory  in  both  coding  theory  and  signal  design 
must  be  ascribed  to  Slepian  [2,3]. 

Tn  some  way  Slepian's  approach  parallels  Klein's  Erlagen  program  on  the 
foundation  of  geometry:  all  geometric  objects  and  concepts  can  be 
formulated  starting  from  the  abstract  notion  of  group  which  provides 
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tlu>  appropriate  tool  lor  every  useful  and  applied  mathematical  theory. 
In  Klein's  words  "a  geometry  is  defined  by  a  group  of  transformat ions , 
and  investigates  everything  that  is  invariant  under  the  transformations 
of  the  given  group".  In  our  context  the  main  object,  left  invariant  by 
the  group  is  a  code ,  as  will  be  defined  later. 

The  Shannon  theory  of  any  communication  process  shows  that  the  informa  - 
t  ion  is  inherently  discrete  and  also  that  the  quantity  of  information 
that  can  be  processed  by  every  practical  system  is  finite. 

Signals  for  sending  information  over  physical  channels  are  essentially 
time-  and  frequency-limited;  as  a  consequence  the  dimension  of  the 
signal  space  is  finite.  The  signal  energy,  defined  as  the  integral  of 
the  signal  square  over  its  finite  time  interval,  induces  an  euclidean 
metric  in  this  signal  space.  Therefore,  by  using  an  orthonormal  basis, 
we  associate  to  each  signal  a  point  (or  vector)  in  an  euclidean  finite 
dimensional  space.  In  this  way  a  finite  set  of  signals  corresponds  to  a 
finite  constellation  of  points  that  we  call  a  code. 

Early  in  the  fifties  SLepian  introduced  the  concept  of  group  code  in 
the  design  of  signal  sets  for  the  Gaussian  channel.  A  group  code  is  a 
set  of  M  unit  vectors  spanning  an  n-dimensional  real  space,  on  which 
the  matrices  of  a  finite  group  representation  operate  transitively. 

A  straightforward  generalization  of  Slepian's  group  codes  is  obtained 
by  considering  a  set  of  initial  vectors  instead  of  just  one  vector.  The 
resulting  set  of  vectors  is  called  generalized  group  alphabet. 

The  present  awakening  of  interest  in  group  codes  is  due  to  their  in¬ 
creasing  use  in  transmission  schemes  of  combined  modulation  with  either 
convolutional  or  block  codes,  an  approach  initiated  by  Ungerboeck. 

A  fundamental  problem  for  Slepian's  group  codes  is  the  choice  of  the 
initial  vector  that  maximizes  the  minimum  distance.  A  second  basic 
problem  concerns  the  existence  of  group  codes  for  every  pair  of  inte¬ 
gers  with  M  greater  than  n.  The  c lass i f icat. ion  of  all  configurations  of 
given  dimension  is  constructively  important.  As  far  as  we  know,  only 
the  classification  in  dimension  three  is  complete.  The  same  problems, 
formulated  for  generalized  group  alphabets,  seem  even  more  difficult. 


?. 


However  tlu1  I  leld  is  with*  and  dese  rvcs  invest  igat  ions  either  (  rom  a 
purely  theoretical  point  of  view  or  for  practical  applications. 

We  are  aware  of  the  fact  that  the  theory  of  group  codes  is  still 
incomplete,  but  the  open  problems  really  challenge  the  human  thinking 
and  st  imulate  the  research  work  of  engineers  and  mathematicians  alike. 


11  -  SIGNAL  SETS:  THE  GEOMETRICAL  MODEL 

Signals  for  sending  information  are  essentially  limited  both  in  time 
and  frequency.  According  to  a  point  ol  view  accepted  in  the  past,  the 
simultaneous  concentration  attainable  in  both  domains  is  limited  by  an 
uncertainty  principle,  so  named  after  the  analogous  relations  in 
quantum  mechanics.  Moreover  energy  constraints  are  imposed  for  practi¬ 
cal  purposes. 

Finite  bandwidth  W  and  finite  time  duration  T  together  imply  that  the 
dimension  of  the  Hilbert  space  of  the  signals  is  essentially  finite. 

1!  we  require  strictly  finite  duration  and  simultaneously  maximum 
concentration  ol  signal  energy  in  a  given  bandwidth,  we  have  a  problem 
whose  natural  mathematical  setting  is  the  calculus  of  variations.  This 
problem  has  hi  on  thorougly  discussed,  (10,5,40,41),  even  if  its  conse- 
qtienoes  have  not  received  much  attention  from  the  signal  designers  yet. 
L"t  V  he  a  Hilbert  .space  with  support  the  interval  (0,T),  and  let  the 
mm  l.ir  product  be  del  ined  as 

T 

;  j  <p(  t. )  iH  t )  dt  <p(  t ) ,  tJ>(t)cV 

0 

where  overbar  denotes  complex  conjugal  ion. 

The  norm  square  ||.||’,  defined  as  | )  |  ( ^  —  ( tp ,  <p )  represents  the  energy  of 
the  signal  <p(.  t  )r.V.  In  the  set  of  linear  operators  acting  in  V  and 
having  a  discrete  spectrum,  the  operators  associated  to  linear  filters 


■  * '  ' '  "l  P.utuul.u  interest.  Let  11(1)  dcnolc  tin-  filter  I  tans  for 
tun1'!  1  on  .  Therefore  I  hi!  Fouriet  transforms  <1>(  i  )  and  'ft  f  )  ,  respectively 
in  tiltei  input  and  output  signal,  are  related  by 

H’(  f  )  =  11(f)  4>(  f  ) 

The  problem  now  is  to  seek  the  input  fund  ion  <p(  t.  )  ,  of  unit  energy,  for 
which  the  energy  ol  the  corresponding  output  functions  tHt),  in  the 
bandwidth  I  W,W),  is  as  large  as  possible.  That  is,  we  want  to  maximize 
t  lie  tol  lowing,  integral 

W  _  w  _ .  _ 

I,  =  J  T(f)  y(f)  df  =  f  H(f)  11(f)  <J>(f)  <t>(f)  df 

.  -W  -W 

unde:  the  constraint 

OO 

I?  =  J  0(f)  0(f)  df  =  1 

-  oo 

By  means  of  Lagrange's  multipliers  the  solution  is  found  to  be  the 
ei  enfunction  associated  to  the  largest  eigenvalue  of  the  integral 

equal:,  ion 

(1)  j  K(t-s)  <p(s)  ds  =  A  <p(t)  tc[0,T] 

0 

where  the  positive  definite  kernel  is  defined  by  the  Fourier  transform 

rw  - - 

K(t-s)  =  H(f)H(f)  exp(2nj(t-s)f )  df. 

-w 

The  posit  i ve  eigenvalues  A,  ordered  in  decreasing  order,  exhibit  the 
typical  trend  shown  in  Fig.l,  which  demonstrates  that  the  dimension  of 
the  signal  space  of  functions  limited  bowh  in  time  and  frequency  is 
e  s  ■  -  Vi  t  i a  1 1 y  finite  and  can  be  taken  to  be  approximately  2WT,  (5).  (If 
.-’TVwlO,  this  statement  is  true  within  an  energy  dispersion  of  some  few 
per  cent  and  irrespective  of  11(f)  ). 


A 


K  ip,.  1  -  Typical  behavior  of  the  eigenvalues  of  equation  (1) 


A  natural  orthogonal  basis  B  =  { 4>  ^  ( t )  ,  n<2WT,  for  the  space  of  the 

signals  limited  both  in  time  and  frequency  is  provided  by  the  set  of 

normalized  eigenfunctions  associated  to  the  set  of  eigenvalues  of 

greatest  value.  By  means  of  the  basis  B,  we  can  uniquely  associate  to  a 

given  set  A  of  M  signals 

n 

m :  ( t:  )  =  X  X:  ■  <1* ;  ( 1 1  i  =  l  ,  ...  ,M 

j-m  J  J 

,i  set  C  of  M  vect.ors 

Xi=(xil ,  ...  ,  xin)  i=l ,  ...  ,M  . 
that  we  call  code.  The  square  of  the  Euclidean  length  of  a  vector  X  is 
eipsal  to  the  energy  of  the  signal  m(t). 

We  can  now  describe  the  operation  of  a  quite  general  model  of  transmis¬ 
sion  scheme  at  the  level  of  signal  manipulation. 

A  transmitter  associates  to  every  source  symbol,  in  a  one-to-one  way,  a 
signal  chosen  in  the  set  A  and  sends  this  signal  through  the  channel. 
The  channel  operates  by  adding  to  the  transmitted  waveform  m(t)  a 
sample  ol  a  zero-mean  random  process  v(t)  with  known  spectral  density. 
The  received  signal  is  thus 

r(t)  =  m^Tt:)  +  v<  ■  >  te[0,T] 

where  \  i s  a  random  variable  taking  values  in  the  set  {1,...,M}. 
if  we  confine  ourselves  to  coherent  detection,  from  the  observation  of 
r '  t  )  over  the  interval  { 0 , T ) ,  the  receiver  makes  an  estimate  of  the 
value  taken  by  f, ,  that  is,  an  estimation  of  the  symbol  emitted  by  the 


source.  Let  us  suppose  that,  all  the  information  relevant  to  every 
detection  criterion  lies  in  the  signal  space,  therefore  any  decision 
can  he  taken  by  referring  to  the  vector 

I=(rl . rn> 

win'  re 

r  T  - 

r j=  I  r(t)  t|»j(t.)  dt 

0 

This  is  equivalent  to  considering  a  discrete-time  continuous-amplitude 
additive  channel  that  produces 
r  =  +  N 

where:  N  is  a  random  vector  with  known  probability  density  f(.); 

Xr  is  a  transmitted  code  vector  from  the  code  C. 

— S 

At  the  receiver  end,  the  decision  taker  may  be  described  by  an  exhau¬ 
stive  partition  of  the  n-dimensional  space  into  K1  disjoint  regions  Rj, 
i=l,...,M',  if  the  received  vector  r  falls  in  region  Rj  then  the  de¬ 
tected  symbol  will  correspond  to  the  integer  j.  We  say  that  the  demodu¬ 
lator  takes  a  "hard"  decision  or  a  "soft"  decision  depending  on  whether 
M'=M  or  M'>M  respectively.  In  conclusion  the  channel  is  modelled  by  a 
discrete  memoryless  channel  with  M  input  symbols  and  M*  output  symbols. 


Ill  -  MEASURES  OF  PERFORMANCE 

The  performance  evaluations  of  group  codes  on  communication  channels 
rule  the  development  of  the  entire  theory  of  group  codes.  Hereafter  we 
briefly  review  some  important  performance  indices  used  in  digital 
commun i cat  ion  systems.  In  order  to  avoid  discussions  depending  on 
transmission  protocols,  here  and  in  the  following  we  will  deal  only 
with  transmission  schemes  based  on  hard  decisions.  In  this  context  the 
most  typical  index  is  error  probability,  i.e.  the  probability  that  the 
receiver  takes  a  wrong  decision  about  the  symbol  emitted  by  the  infor- 
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mat  ion  sourin'.  Assuming  in  particular  equienergot.ii:  codes,  white  Gaus¬ 
sian  noise  channel  and  maximum  likelihood  decision  criterion  at  the 

receiver's  end,  then  the  regions  R^,  i=l . M,  will  be  connected 

hypercones  hounded  by  hyperplanes  with  the  vertices  in  the  origin. 
Therefore  the  error  probability  is  given  by  a  sum  of  n-dimensional 
integrals;  letting  Rj  denote  the  complementary  region  of  Rj  in  Rn  and 
let  p{X:}  be  the  probability  of  sending  message  i,  we  have 


M 

p(e)  =  Y 
i  =  l 


J^ff.X-Xj)  dX  p{Xi} 

R. 

1 


A  second  important  index  is  the  configuration  matrix  C=(c  — )  defined  as 
the  Gram  matrix  of  the  set  of  vectors,  i.e. 

cij  =XI  Xj  . 

This  matrix  C  occupies  a  central  position  in  the  theory  of  group  codes. 
It  conveys  all  the  information  relevant  to  evaluate  code  performances 
on  the  white  Gaussian  channel  and  is  also  useful  to  compute  other 
performance  indices. 

A  third  reLevant  index  is  the  minimum  distance  defined  as  the  minimum 
distance  between  any  pair  of  distinct  vectors  of  the  code,  that  is 

<imin  =  min  II  Xi  '  XjU2 

i*j 

The  evaluation  of  each  performance  index  is  usually  very  hard,  so  that 
frequently  the  knowledge  of  upper  and/or  lower  bounds  is  of  sufficient 
interest.  As  an  example  we  derive  an  upper  bound  for  the  error  probabi¬ 
lity,  that  applies  to  symmetric  point  configurations. 

Let  us  assume  that  the  code  has  a  symmetry  such  that  the  error  probabi¬ 
lities  conditioned  on  a  given  code  vector  do  not  depend  on  this  vector, 
i.e.  p { e }  =  p{e | X j }  i=l,...,M 

Let  the  region  R;,  i=l,...,M,  be  bounded  by  the  set  of  s  hyperplanes  of 
equations  IfX-X^]2  =  ||X-Xjj|2 
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wtuu..'  j  belongs  to  a  convenient  subset,  of  {1 . M >  ;  the  explicit 

equation  of  each  hyperplane  turns  out  to  be  X*(Xj-Xj)  =  0. 

Applying  the  union  bound,  we  get  a  general  upper  bound  for  the  error 
probabi 1 i t  y 


p(e)  =  p  { e  j  X  j  }  =  j  f(X-Xi)  dX  <  X  j  fCX-Xj) 
P 

<  s  f  f(X-X,) 


dX 


j  =  l 


dX 


where  Qj  is  the  halfspace  defined  by  the  inequality  X^(X^-Xi)  <  0. 


Qq  is  the  half space  defined  by  the  inequality 


XT(X,-X0)  <  0. 


and  XQ  is  a  code  vector  at  the  minimum  distance  from  Xx. 


More  detailed  comments  on  performance  indices  will  be  provided  after 
the  description  of  the  main  features  of  group  codes. 


IV  -  GROUP  CODES 

Symmetry  seems  to  be  an  unavoidable  occurrence  for  reducing  the  comple¬ 
xity  of  every  high-dimensional  set  of  signals  as  required  by  Shannon's 
channel  theorem  to  guarantee  high  coding  performance.  For  instance,  we 
can  take  advantage  of  symmetry  in  designing  good  decoding  algorithms 
for  error  control  codes.  Symmetry  makes  feasible  the  new  digital  modu¬ 
lation  schemes  which  combine  error  control  codes  and  modulations. 

As  we  observed  in  the  introduction,  symmetry  cannot  be  separated  from 
the  notion  of  group  which  discloses  symmetry's  real  nature  and  con¬ 
stitutes  its  formal  counterpart.  It  was  early  in  the  fifties  that 
Slepian  introduced  the  group  codes  for  Gaussian  channels;  his  ideas 
found  a  definitive  formulation  in  a  stimulating  paper  [3],  in  1965. 
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Now  lot  us  formally  detino  (lie  main  object  of  this  paper. 

Do  i  ini  t  ion  1  . 

Consider  a  finite  set  S(G)  =  {11(g):  gcG}  of  real  orthogonal  matri¬ 
ces  that  form  a  faithful  representation  of  a  finite  group  G  and  consi¬ 
der  an  n-dimens  ional  unit  vector  X,.  The  set  S(G)X,  =  {X  =D(g)X,  :  gcG} 

O 

of  M  vectors  generated  by  the  action  of  S(G)  on  X!  is  called  group  code 
Slid  denoted  tiy  (M,nJ,  if  it  spans  the  n-dimensionai  space;  otherwise  i  u 
is  called  planar  group  code. 

In  the  present  theory,  group  representations  by  matrices  having  real 
entries  arc  a  fundamental  mathematical  tool. 

The  theory  of  group  representations  originated  in  the  middle  of  the 
nineteenth  century  from  the  works  of  many  mathematicians.  Equipped  with 
the  theory  of  group  characters,  (the  character  of  gcG  is  the  trace  of 
the  matrix  D(g)),  the  theory  of  matrix  groups  assumed  a  central  role  in 
the  development  of  modern  algebra.  We  do  not  try  to  survey  this  sub¬ 
ject.  To  coding  theorists  we  recommend  the  book  by  Blake  and  Mullin 
(12],  while  for  a  thorough  development  of  the  topic  we  refer  to  the 
books  by  Curtis  and  Reiner  (24],  Burrow  (17]  and  van  der  Waerden  [48]. 
Old  fashioned  but  very  rich  and  suggestive  is  the  book  by  Burnside, 
[16]. 

For  easy  reference  and  later  use  we  recall  some  results  concerning 
group  representations. 

1  -  A  group  representation  is  either  irreducible  or  completely 

reducible,  i.e.  it  can  be  written  as  direct  sum  of  irreduci¬ 
ble  components. 

2  -  A  representation  with  real  entries  may  be  either  real  redu¬ 

cible,  or  real  irreducible.  In  this  second  case  it  may  still 
he  complex  reducible  or  not. 

3  -  lue  number  of  distinct  irreducible  components  is  equal  to 

the  number  of  group  classes. 
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wo  obtain  a 


•'i  -  Given  two  representat  ions  of  groups  G  and  G, 

ropresontaL  ion  of  their  direct:  product  by  means  of  the 
direct:  matrix  sum 

D(g  g’)  =  D(g)  «  l)(g'  )  gcG  and  g '  cG  t 

The  concept  of  direct,  matrix  sum  is  very  important  in  describing  the 
structure  of  group  codes.  The  general  observation  fits  a  paradigmatic 
principle:  in  rr»ny  instances  to  «pi  it  a  problem  r.ie^us  to  solve  it. 

Let  | G J  denote  the  cardinality  of  the  group  G.  The  cardinality  M  of  the 
code  may  be  less  than  or  equal  to  |G|.  In  case  it  is  less  there  exists 
a  subgroup  H  of  G  such  that  the  initial  vector  is  left  invariant,  i.e. 

HX  j  =X  j 

where  with  HX1  we  denote  the  set  {X:  X=D(h)X1  ,  hcH}. 

The  proof  of  the  following  theorem  is  straightforward  and  follows  from 
definition  1  and  elementary  properties  of  the  groups. 

Theorem  1 ■ 

i)  | G | >M  and  |G|  |  M! 

ii)  if  | G |  >  M  then  M  |  |G| 

where  d|b  means  that  d  is  a  divisor  of  b. 

The  following  theorem  concerning  the  subgroup  H.  has  an  important 
consequence  on  the  existence  conditions  for  group  codes.  It  is  also 
useful  to  clarify  the  relations  between  the  group  and  the  code. 

Theorem  2. 

The  subgroup  H  cannot  be  normal. 

See  [7,  12,  35]  for  a  proof. 

Theorem  3. 

If  G  is  abelian  then  |G|  =  M. 
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con  - 


Besides  Liu;  abstract  propert  ies  of  the  group  G,  also  conditions 
corning  the  skeleton  of  its  representations  are  important  for  distin¬ 
guishing  between  planar  and  non  planar  codes. 

In  order  that,  an  initial  vector  exists  such  that  the  generated  set  of 
vectors  spans  the  n-d imens iona 1  space,  the  representations  of  the  group 
G  must  satisfy  the  condition  expressed  in  the  following  theorem. 

Theorem  4. 

Given  an  n-dimensional  representation  D(g)  of  a  group  G,  a  vector 
X,cEn  exists  such  that  the  set  [D(g)Xlt  geG)  of  vectors  spans  En  if 
and  only  if  every  irreducible  representation  contained  in  D(g)  appears 
with  a  multiplicity  less  than  or  equal  to  its  dimension. 

For  a  proof  see  Blake  and  Mullin  (12]. 

Definition  2. 

A  representation  is  said  full  homogeneous  if  every  irreducible  compo¬ 
nent  has  a  multiplicity  equal  to  its  dimension. 

The  symmetry  of  a  group  code  is  exploited  by  the  configuration  matrix. 
According  to  the  previous  definition,  it  is  an  M  by  H  matrix  of  rank  n 
the  entries  of  which  are  the  scalar  products  c^j  =  Xj  i,j=l,...,M. 
It  is  also  of  interest  to  define  an  extended  configuration  matrix  Ce 
whenever  |G|>M.  Let  X  =D(g)Xj  be  the  vector  produced  by  the  action  of 

o 

the  element  gcG.  We  define  the  extended  configuration  matrix  as  the  | G j 
by  | G |  Gtam  matrix  whose  entries  are 

cgg'  =xlv  g’g'cG 

Since  Hj^{e},  the  vectors  of  the  set  S(G)Xj  are  not  all  distinct;  in 
fact  the  same  vector  appears  with  multiplicity  |H|. 

The  following  theorem  illustrates  the  shape  and  structure  of  configura¬ 
tion  matrices  which  rely  in  depth  on  the  associated  group. 
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Theorem  5. 

The  rows  of  any  configuration  matrix  of  a  group  code  are  permuta- 
t ions  of  the  first  one. 

This  applies  to  both  extended  and  not  extended  configuration  matrices. 
For  a  proof  see  [3]  and  [10]. 

It  is  not  hard  to  verify  that  the  extended  C°  configuration  matrix  is 
the  Kroneche.  product  of  C  by  a  matrix  J,  (possibly  with  a  re-ordering 
of  rows  and  columns): 

Ce=  C  ®  J 

where  J  is  a  convenient  matrix  of  which  all  entries  are  Is. 

The  importance  of  the  configuration  matrix  C  of  a  group  codes,  was 
enhanced  by  Slepian’s  proof,  [3],  that  it  is  possible  to  recover  the 
vectors  of  the  code  from  C.  Let  P^(g),  geG,  denote  the  permutation 
matrices  of  the  right  permutation  representation  of  G  induced  by  its 
subgroup  H;  let  AG(H)  be  the  group  algebra  of  G  generated  by  these 
permutation  matrices,  and  let  AZ(H)  be  the  centralizing  algebra  of 
AG(H).  We  have  the  following  theorems. 

Theorem  6. 

The  extended  configuration  matrix  of  a  group  code  can  be  written  as 
the  sum 

CG=  ^  c ( g )  L(g) 

where  L(g),  geG,  are  the  permutation  matrices  of  the  left  regular 
permutation  representation  of  G. 

Theorem  7.(Slepian) 

The  extended  configuration  matrix  commutes  with  all  the  permutation 
matrices  of  the  right  regular  permutation  representation  of  G,  i.e.  Ce 
belongs  to  the  centraling  algebra  of  the  group  algebra  of  the  right 
regular  permutation  matrices. 

The  configuration  matrices  of  different  group  codes  generated  by  diffe- 
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rent  irreducible  represent.it.  ions  of  the  same  group  G  may  originate  an 
orthogonal  basis  in  the  regular  group  algebra  AG({e}),  as  stated  in  the 
following  theorem  due  to  Blake. 

Theorem  8. 

Let  b(g)  and  D'(g)  be  real  irreducible  representations  of  the  finite 
group  G  of  dimensions  iij  ami  rij.  respectively ,  and  and  Cj  the 
configuration  matrices  of  the  group  codes  (D(g)Xj,  gcG}  and  {D'(g)Xj, 
gcG} ,  respectively.  Then 

i)  if  D(g)  and  D'(g)  are  not  equivalent,  then  Cj  =  0  for  any 
Xt  and  X j ; 

ii)  if  D(g)  =  D'(g)  and  Xi  =  X  j  ,  then  (Ci)2=(G/ni)  ||XjJ|2  Ci- 
For  a  proof  see  Blake  and  Mullin  (12). 

Furthermore  special  structures  of  the  configuration  matrix  may  uniquely 
characterize  the  group  code. 

Theorem  9. (Blake) 

Let  us  consider  the  configuration  matrix  C  of  an  [M,n]  code  in  which 
all  entries  of  the  first  row  are  distinct. 

Then  C  is  the  configuration  matrix  of  a  group  code  if  and  only  if: 
i)  its  rows  are  permutations  of  the  first  one; 

ii)  M  is  a  power  of  2,  i.e.  M=2S; 

iii)  in  the  decomposition 

C=  X  cL  Pi 

the  matrices  are  permutation  matrices  of  order  two  and 

commute  with  each  other. 

Moreover  n>s  and  the  group  generating  the  code  is  commutative  of  type 

(1,1,. ..,1). 


Now  we  can  devise  a  general  theorem  concerning  the  conditions  for  a 
given  Gram  matrix  to  be  the  configuration  matrix  of  a  group  code. 
However  the  formulation  of  such  general  conditions  may  be  quite  unsati- 
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start  ory,  because  they  tack  either  classical  mathematical  fascination 
or  practical  utility.  It  is  a  challenging  question  to  find  more  pleas¬ 
ant  and  possibly  useful  conditions. 

Theorem  10. 

A  Cram  matrix  C  is  the  configuration  matrix  of  a  group  code  if  and  only 
i  t 

i)  rows  of  C  are  permutations  of  the  first  one; 
ii)  a  matrix  J,  all  entries  of  which  are  Is  and  the  order  of 
which  is  not  greater  than  (M-l)!,  exists  such  that  the 
matrix  C'=C  «  J  commutes  with  all  matrices  of  a  right 
r  gular  representation  of  a  group  G. 

See  [10]  for  a  proof. 

We  stop  here  the  presentation  of  Slepian's  group  codes.  In  the  next 
section  we  shall  consider  an  extension  that  will  include  multilevel 
codes  which  share,  of  course,  the  same  underlying  property  of  symmetry. 


V  -  GENERALIZED  GROUP  ALPHABETS 

The  class  of  multidimensional  alphabets  is  introduced.  Special  instan¬ 
ces  of  these  codes  have  been  widely  used  for  designing  multidimensional 
signals  in  combined  modulation  and  coding.  Their  structure  is  very  rich 
in  symmetries  and,  as  far  as  we  know,  most  of  the  signal  constellations 
in  actual  use,  either  equienergetic  or  not,  belong  to  this  family. 

De.  f  in  it  ion  3  . 

Consider  a  set  of  K  n-vectors  X  =  {Xj , .  . .  ,Xjr}  ,  called  the  initial 
set,  and  L  orthogonal  n  x  n  matrices  S^,...,  that  form  a  represen¬ 
tation  S(G)  of  the  group  G.  The  set  of  vectors  S(G)X|,  ...  ,  S(G)X[,r 
obtained  from  the  action  of  S(G)  on  the  vectors  of  the  initial  set  is 
called  a  Generalized  Group  Alphabet,  and  from  now  on  shortened  to  GGA. 


Do 1 i n i t  i on  4 . 


A  GGA  is  called  separable  ii  the  vectors  of  the  initial  set  are  tran¬ 
sformed  by  S(G)  info  either  disjoint  or  coincident  vector  sets,  i.e., 

f  0  j  t  k 

S ( G ) X j  n  S(G)Xk  =  -( 

S  ( G )  X  j  j  =  k 

Since  an  orthogonal  matrix  transforms  a  vector  into  one  with  the  same 
length,  the  signals  associated  with  a  GGA  have  as  many  energy  levels  as 
there  are  in  the  initial  set. 

Definition  5. 

A  GGA  is  called  regular  if  the  number  of  vectors  in  each  subalphabet 
S(G)Xj,  j=l,...,K,  does  not  depend  on  j,  i.e.,  each  vector  of  the 
initial  set  is  transformed  by  S(G)  into  the  same  number  of  distinct 
vectors.  A  regular  GGA  is  called  strongly  regular  if  each  set  S(G)Xj 
contains  exactly  L  distinct  vectors. 

The  following  result  stems  directly  from  the  definitions. 

Theorem  11. 

The  number  M  of  vectors  in  a  regular  GGA  is  a  multiple  of  K.  If  GGA  is 
strongly  regular,  then  M=KL. 

We  consider  now  some  distance  properties  of  the  elements  of  a  GGA. 

Choose  a  partition  of  a  GGA  into  m  subsets  Z^  ,  Z2 .  Z^.  For  each 

subset  Z • ,  we  can  define  the  intradistance  set  as  the  set  of  all  the 
Kuclidean  distances  among  pairs  of  vectors  in  Z j .  For  any  pair  of 
distinct  subsets  Z^,  Z j ,  we  define  their  interdistance  set  as  the  set 
of  all  the  Euclidean  distances  between  a  vector  in  Z^  and  a  vector  in 


1  r> 


Do  I  i  n  i  t  i  on  (> . 


the  partition  of  a  separable  GGA  into  in  subset  s  Zj,...,Z^  is  called 
fair  if  all  the  subsets  are  distinct  ,  include  the  same  number  of  vec¬ 
tors  and  their  intradistance  sets  art?  equal. 

We  shall  now  present  a  constructive  method  to  generate  fair  partitions 
of  a  GGA.  Consider  the  generating  group  S(G)  of  the  GGA,  one  of  its 
subgroups,  say  S(H),  and  the  partition  of  S(G)  into  left  cosets  of 
S(H).  We  have  the  following  result. 

Theorem  12. 

If  the  left  cosets  of  the  subgroup  S(H)  are  applied  to  the  initial  set 
of  a  strongly  regular  GGA,  this  procedure  results  in  a  fair  partition 
of  the  GGA.  Under  the  same  hypotheses,  if  S(H)  is  a  normal  subgroup, 
then  left  and  right  cosets  give  rise  to  the  same  fair  partition. 

For  a  proof  see  (11]. 

The  condition  of  strong  regularity  of  the  GGA  can  be  removed:  but  in 
this  case  it  may  happen  that  different  cosets  generate  the  same  element 
of  the  partition.  Hence,  some  of  the  cosets  must  be  removed  from  consi¬ 
deration.  Moreover,  notice  that  if  S(H)  is  a  normal  subgroup  of  S(G), 
then  we  do  not  need  to  distinguish  between  left  or  right  coset  parti¬ 
tions.  On  the  contrary,  if  S(H)  is  not  normal,  the  partitions  obtained 
from  right  cosets  may  not  be  fair,  as  it  can  be  shown  by  a  counterexam¬ 
ple.  In  some  cases,  wo  are  interested  in  further  partitioning  every 
element  in  the  same  number  of  subsets.  This  leads  to  the  concept  of 
a  chain  partition,  that  is  the  GGA  is  partitioned  in  subsets  which  in 
turn  are  partitioned  in  the  same  number  of  sub-subsets,  and  so  on.  We 
call  level  of  a  subset  in  the  chain  partition  the  number  of  inclusions 
beetwen  the  given  subset  and  the  whole  group  code. 
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l)e  f  l  n  l  I  ion  /  . 


[|h>  chain  part  it  io:  of  a  separable  CiCi A  is  called  lair  it  any  two 

elements  of  t  ho  part,  it  ion  at  (In?  same  level  ot  the  chain  include  the 

same  number  ot  vectors  and  have  equal  i nt rad i stance  sets. 

K or  tair  chain  part  it  ions  we  have  t. ho  iol  lowing  theorem. 

Thee  rein  1  i . 

Consider  a  strongly  regular  CCA,  and  a  chain  of  subgroups  of  its 
generating  group  S(G),  that  is 

S(Hj)  c  S(H2)  c  S(H3)  c  . ..c  S(Hs)  =  S(G) 

Use  H0_i  and  its  left  cosets  to  generate  a  partition  of  GGA.  Then,  use 
Hs  i  and  its  left  cosets  in  Hs  to  further  partition  all  the  sets  of  the 
previous  partition.  Repeat  the  procedure  with  Hs_2,  and  so  on*  until 

Hj  and  its  lef*'  cosets  in  H2  are  used.  The  resulting  chain  partition  of 

GGA  is  fair. 

A  theorem  concerning  the  interdistance  sets  sheds  some  further  light  on 
the  symmetry  properties  of  GGA's. 

Theorem  14. 

Let  H  be  a  normal  subgroup  of  G.  The  partition  of  a  strongly  regular 
GGA  obtained  by  applying  the  left  cosets  of  H  to  the  initial  set  X  has 
the  following  property:  the  interdistance  set  associated  with  any  two 
cosets,  say  SjH  and  S2H,  is  a  function  only  of  the  coset  $3H,  where 
S-j  sjs-;,  and  not  of  Sj,  S2  separately. 

For  a  proof  see  111). 

We  conclude  this  section  by  showing  how  GGAs,  in  particular  group 
codes,  can  he  used  in  conjunct  ion  with  error  control  codes  to  exploit 
the  channel  capacity  further.  We  shall  illustrate  first  the  joint  use 
of  multidimensional  alphabets  and  block  codes,  thus  we  will  describe 
how  the  signal  alphabets  are  paired  to  convolutional  (trellis)  codes. 


1  7 


Ir'i.ii  and  Hiiikawn  [  ii)  and  recent  ly  Ginzburg  |  SI]  have  described  con- 
■>t  nictaons  which  make  it  possible  to  design  set  of  signals  with  a 
regular  structure  and  with  an  arbitrary  minimum  distance  as  insured  by 
the  algebraic:  properties  ol  block  codes.  Ginzburg's  construction  consi¬ 
ders  1.  block  encoders  Cj.Gy, which  accept  source  symbols,  and 
output  L  blocks  (  d  j  i  ,  qy  i »  -  •  -  » ‘iNi  )  »  i=l,...,L,  of  N  symbols  each.  The 

modulator  t  raps  each  i.-kupK.  ),  j  =  l . N,  into  the  vector 

Xj  =f(q  j  i - ,<lji,b  j=  1  ,  ...  ,  N 

chosen  from  a  GGA  of  M=Mj...Mj  elements.  This  mapping  is  obtained  as 
follows  In  GGA  we  define  a  system  of  L  partitions  such  that  each 
class  of  the  l-th  partition  includes  classes  of  the  (i-l)-th 

partition.  Each  class  will  consist  of  M(  2.)=MjM2  -  -  -  Mg  signals.  By 
numbering  the  classes  of  the  (il-l)-fh  level  occurring  in  a  class  of  the 
H-th  level  we  can  obtain  a  one-to-one  mapping  of  the  set  of  classes  of 
the  (t-l)-th  partition  onto  the  set  of  integers  { 0 , . . . ,Mf - 1 } .  There¬ 
fore,  if  q-j  are  chosen  in  the  set  {0 ,  .  .  .  ,  1 } ,  2.=  l,...,L,  any  L-tuple 

( q j },..., j  )  delines  a  unique  value  of  the  j-th  elementary  signal 
X  j  =  f  (  q  j  1  ,  .  .  .  ,  q  j  p )  . 


We  shall  now  see  how  an  Ungerboeck  code  can  be  designed  using  GGA.  The 
procedure  suggested  in  (47)  and  called  "mapping  by  set  partitioning", 
can  be  achieved  by  the  notion  of  fair  partition,  which  represents  a 
systematic  generalization  of  that,  concept. 

Each  coded  symbol  depends  on  k+v  source  bits,  namely  the  block 
x - ( a  j .... ,3^)  of  k  bits  generated  by  the  source,  plus  v  bits  preceding 
this  block.  The  v  bits  determine  one  of  the  N=2V  states  of  the  encoder, 
sav  o  =  (ap+j,  ...  ,  akfv^’  an=0,l.  The  encoder  state  for  the  next 
coded  symbol  is  obtained  by  shifting  the  an's  k  places  to  the  right, 
dropping  the  right -most  k  bits  and  inserting  on  the  left  the  most 
recent,  k  source  bits.  The  encoded  symbol  X j  ,  which  is  an  element  of  a 
GGA,  depends  on  x  and  o  and,  in  this  framework,  the  encoding  procedure 
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cun  be  desc  i  i  bed  using  a  tit'll  is  and  by  assigning  to  t  lie  branches 
outgoing  (tom  each  node  the  set  til  symbols  obtained  from  a  fair  parti- 
t  ion  of  a  GGA . 


VI  -  THE  INITIAL  VECTOR  PROBLEM 

The  minimum  distance  is  a  relevant  factor  to  define  the  code  performan¬ 
ce  on  noisy  channels  because  it  is  a  fact  that  distant  signals  are  hard 
t.o  confuse  as  an  effect  of  the  noise.  Moreover  monotone  decreasing 
functions  of  the  minimum  distance  constitute  an  upper  bound  to  the 
error  probability.  It  follows  that  codes  with  large  minimum  distances 
are  desirable,  and  in  particular  the  choice  of  Slepian's  group  codes 
with  the  greatest  minimum  distance  leads  to  the  initial  vector  problem 
which  is  also  interesting  from  a  geometrical  point  of  view. 

The  initial  vector  problem  for  group  codes  can  be  stated  as  follows: 
given  a  finite  group  S(G)  of  orthogonal  matrices  that  generates  a  group 
code  [M,nJ  by  operating  on  an  initial  unit  vector  X,  among  all  such 
vectors  X  find  out  the  vector  XQ  for  which  the  minimum  distance  is  the 
greates..  possible.  We  have  to  find  the  maximum  of  the  minimum  of  the 
distances,  i.e.  to  determine  a  kind  of  saddle  point  with  respect  to  the 
continuous  variable  X  and  discrete  variable  g: 

max  (  min  d(D(g' )X,D(g)X) j 
^  g^g' 

where  the  maximum  is  taken  over  all  the  vectors  of  Rn  with  the  con¬ 
straints  j|X||=l  and  S(H)X=X.  S(H)  is  a  subgroup  of  S(G),  possibly  H={e}. 
At  the  present  time  no  general  solution  is  known.  The  problem  has  been 
solved  for  many  classes  of  group  codes  and  for  codes  generated  by 
special  representations.  Djokovic  and  Blake,  [25],  settled  the  case  of 
full  homogeneous  component;  Downey  and  Karlof  found  all  the  optimal 
group  codes  in  three  dimensions  (28);  Biglieri  and  Elia  identified  the 
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opt  im.ii  ii, it  t.tl  vector  tor  Variant  I  pnrnnit-.il.  ion  codes,  [9],  and 
showed  that  for  cyclic  codes  [8]  as  wo  1 1  as  for  abelian  codes  the 
optimal  initial  vector  is  obtained  by  solving  a  linear  programming 
problem.  Nevertheless,  the  evidence  so  far  is  that  the  problem  cannot 
have,  in  general,  a  closed  form  solution. 

We  do  not  digress  on  the  meaning  of  "solution",  but  we  adopt  the 
pragma t ic  view  that  for  practical  purposes  any  kind  of  numerical  solu- 
t  ions  should  be  regarded  as  a  valid  one. 

For  computational  approaches  the  initial  vector  problem  can  be  stated, 
in  general,  as  a  mathematical  problem  with  a  quadratic  objective  suo- 
jected  to  quadratic  constraints,  [37]- 

Let.  d2  bo  the  minimum  square  distance.  The  optimal  initial  vector  Xj  is 
the  solution  to: 

d2  =  Max  Min  d2(D(g)X1 ,Xj ) 

where  the  maximum  is  taken  over  all  unit  vectors  and  the  minimum  is  on 
all  elements  gcG  different  from  the  identity. 

For  any  unit  vector  X  and  unitary  matrix  D(g),  we  have 
d2(D(g)X,X)=2-2(D(g)X,X). 

Thus  maximizing  the  minimum  distance  is  equivalent  to  minimizing  the 
maximum  inner  product.  We  may  assume  the  maximum  inner  product  positive 
and  equal  to  r2.  Let  Y=(l/r)X1.  Then,  for  all  non  identity  elements  of 
G,  (D(g)Y,Y)<l  and  (Y,Y)=l/r2.  Hence  Y  is  a  solution  to: 

Find  Max  (Y,Y) 

subject  to  (D(g)Y,Y)<l  whenever  g  is  not  the  identity  in  G. 

The  problem  of  the  initial  set  of  vectors  for  GGA  is  more  complicated, 
of  course,  than  for  group  codes  because  more  than  one  vector  is  to  be 
found  and  different  objectives  may  motivate  the  choice.  In  this  case 
one  formulation  of  the  initial  set  vector  problem  is  the  following: 
Given  S(G)  find  a  set  {Xj ,  .  .  .  ,Xj^}  of  K  n-dimensional 
vectors  with  average  square  norm  equal  to  E,  such  that 
the  generated  GGA  is  regular  and  such  that  the  minimum 
distance  is  as  large  as  possible. 
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Hero  wo  do  not  treat  the  subject  further,  as  the  discussion  would  be 
very  long.  For  example  GGA  used  in  conjuction  with  error  control  codes 
hopefully  must  have  the  maximum  possible  minimum  intradistance  associa- 
t  i?d  t  o  a  given  fair  partition. 

In  this  context  the  open  problems  are  countless;  the  few  known  solu¬ 
tions  either  are  heuristic  or  obtained  by  hand  manipulations.  Much  work 
must  still  be  done. 


VII  -  THE  CONSTRUCTIVE  VIEW 


One  important  intent  of  the  group  code  theory  is  to  produce  good  point 
constellations  for  the  design  of  digital  signals  to  be  used  in  data 
transmission,  vector  quantization,  pattern  recognition  or  in  many  other 
fields.  A  second  and  ambitious  objective  of  this  theory  is  the  systema¬ 
tic  classification  and  construction  of  all  regular  point  constellations 
in  n-dimensional  spaces.  Before  discussing  the  capabilities  of  the 
constructive  methods  of  group  coding  theory,  we  present  three  intere¬ 
sting  point  constellations  that  have  large  minimum  distances  and  provi¬ 
de  a  good  instance  of  this  matter. 

The  first  example  is  given  by  the  [8,3]  group  code  which  is  the  classi¬ 
cal  constellation  shown  in  Fig. 2,  (edges  connect  points  at  minimum 
distance),  that  has  a  minimum  distance  slightly  greater  than  the  cube. 
It  is  generated  by  the  action  of  the  representation  of  the  cyclic  group 

r.s . 


The  group  is  generated  by: 

D(g)=  (-l)h 


cos(tt1i/4)  sin(nh/4)  ' 
-sin(nh/4)  cos(irh/4); 


The  initial  vector  is  (Ji  1  /  (  2j2  +  1 )  )  ,  J(  2-J2/  (  2  J2  +  1 )  )  ,  0) 


The  minimum  distance  is  d^jn  =  4/(2  +  1/72)  >  4/3 
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Fig. 2 


The  second  example  is  a  not  regular  and  not  equienergetic  GGA  having  14 
points  in  3  dimensions.  The  configuration  shown  in  Fig. 3,  is  generated 
by  the  action  of  a  representation  of  the  group  of  the  cube 

C2  x  C2  x  C2  x  S3  . 

The  initial  set  is  {(u,  0,  0),  (v,  v,  v)},  where 

v  =  7 7  (7  -  2  72)/ 123  u  =  77  (13  +  8  72)/ 123 

The  minimum  distance  is  dj^n  =  28  ( 7  -  2  72)/ 123  =  0.9496  and  it  is 
significantly  greater  than  0.93386,  the  minimum  distance  of  the  best 
known  spherical  14  point  configuration. 
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Finally,  t  ho  third  and  last,  example  is  the  [  1 6 ,  4  ]  group  code  generated 
by  t.ho  action  of  a  representation  of  the  abelian  group  C?  x  CH  .  The 
configuration  is  shown  in  Fig. 4.  The  representation  is  generated  by 

j  cos (uli /4)  sin(nh/4)\  k=l,2 

D(g)  =  (-l)k  •  (-l)h+k  •[ 

\ -sin( wh/4)  cos(nh/4)/  h=l,...,8 

The  initial  vector  is  (7aV2  ;  1 )/-).  JiU 2-0/2),  J{2  -  Jl),  0)  . 

The  minimum  distance  is  d^£n  =  2  (2  -  ,/2)  =  1.1716 

Note  that  one  of  the  most  used  point  constellations,  the  two  dimensio¬ 
nal  16-QAM  has  minimum  square  distance  2/5=  0.4. 


Fig-4 

The  ingredients  involved  in  the  constructive  aspect  of  group  codes  are 
groups,  matrices  anc  imagination.  Four  remarkable  achievements  are  par¬ 
ticularly  important: 

1)  an  old  theorem  by  Jordan  stating  that  the  number  of  finite 

groups  with  trivial  maximal  normal  abelian  subgroup,  which 
have  an  irreducible  representation  of  dimension  n,  is  finite 
and  upper  bounded  by  b(n)=  n!  (n-l)+2^  where  n(n) 

counts  the  number  of  primes  less  than  n; 

2)  the  recent  classification  of  all  finite  simple  groups; 

J)  the  fact  that  the  number  of  finite  groups  of  given  order  is 
f in i to ; 
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4)  the  complete  classification  of  all  commutative  groups  as 
well  as  their  representations. 

Finite  simple  groups,  Galois'  fundamental  discovery,  are  instrumental 
in  building  up  all  other  groups  and  their  representations.  Abelian 
groups  together  with  finite  group  having  trivial  center  can  be  used  to 
classify  all  groups  which  have  a  representation  in  n-dimensional 
spaces.  In  this  context  it  is  useful  to  recall  the  outstanding  theorem 
of  the  classification  of  finite  groups,  completed  in  1981.  This  theorem 
resulted  from  the  global  efforts  of  several  hundred  mathematicians  from 
all-over  the  world  over  a  period  of  100  years.  It  is  remarkable  by 
itself  and  relevant  to  the  classification  of  group  codes. 

Theorem  15.  (  20 ] 

The  finite  simple  groups  are  to  be  found  among: 

i)  the  cyclic  groups  Cp  of  prime  order  p. 

ii)  the  alternating  groups  of  degree  n  at  least  5. 

iii)  the  Chevalley  groups 

iv)  the  Tits  group 

v)  the  26  sporadic  simple  groups. 

The  Mathieu  group,  usually  denoted  by  M24,  played  a  central  role  in  the 
discovery  of  all  26  sporadic  groups.  M24  is  also  important  in  the 
theory  of  error-correcting  codes,  because  it  is  the  automorphism  group 
of  the  Golay  code  (24,12,8),  the  only  binary  perfect  multiple  error 
correcting  code;  see  [39,49,21]. 

Even  if  it  is  not  necessary  to  resort  to  the  above  definitive  theorem, 
simple  groups  play  a  basic  role  in  group  codes. 

Theorem  16. 

Let  us  consider  a  [M,n]  group  code  generated  by  a  group  G  through 
its  representation  D(g).  If  H  is  a  prime  number  then  the  group  code  is 
generated  by  a  cyclic  subgroup  of  G. 


Theorem  17. 

No  (M,nj  group  codes  exists  if  M  is  an  odd  prime  and  n  is  odd. 

‘J  'heorom  18. 

A  [M,n]  group  code  can  be  constructed  using  representations  of  a 
cyclic  group  provided  that  either 
i)  n  is  even  and  M>2 
or 

ii)  n  is  odd  and  M  is  even. 

Theorem  19. 

The  number  of  (M,n)  group  codes,  generated  by  irreducible  repre¬ 
sentations  of  groups  with  trivial  maximal  normal  abelian  subgroup  is 
finite  and  bounded  by  a  function  of  n  alone. 

Concluding  this  section  we  remark  that  the  problem  of  the  existence  of 
group  codes  for  every  M  and  n  is  very  interesting  as  it  concerns  the 
existence  of  regular  conf igurations  of  points  on  n-dimensional  spheres, 
and  generalizes  the  vertex  configurations  of  regular  polytopes. 

We  can  summarize  the  results  as  follows: 

a)  n  even  M>n+1  at  least  one  group  code  generated  by  a 

cyclic  group  of  order  M  exists 

b)  n  odd,  M  even>n+l  at  least  one  group  code  generated  by  a 

cyclic  group  of  order  M  exists 

c)  n  even  M  odd  prime  only  one  group  code  generated  by  the 

cyclic  group  of  order  M  exists 

d)  n  odd,  M  odd  prime  no  group  code  exists 

e)  n  =  3,  any  M  all  group  codes  have  been  classified  by 

Downey  and  Karlof.  No  group  codes  with  M 
odd  exist. 
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The;  definitive;  classification  ol  all  group  coeles  is  far  from  complete, 
so  that  many  open  problems  and  conjectures  still  deserve  attention. 
Most  of  these  problems  are  appealing  and  may  produce  beautiful  results. 
We  recall,  by  way  of  sample,  two  interesting  problems  that  are  still 
open : 

One  group  code  in  dimension  5  with  M=15  is  known  to  exist,  [26], 
It  is  conjectured  that  it  is  the  only  group  code  in  five  dimensio¬ 
nal  space  with  an  odd  number  of  points. 

Brauer  (15]  and  his  school  have  reached  the  classification  of  all 
groups  having  an  irreducible  representation  in  dimension  4  and  5. 
It  would  be  interesting  to  find  out  all  group  codes  in  dimension  4 
(the  useful  dimension  for  today's  applications). 

The  determination  of  all  group  codes  [M,5]  would  also  be  intere¬ 
sting  as  well  as  the  classification  of  ( M ,  7  ]  .  The  latter  is  possi¬ 
ble  due  to  the  complete  list  of  groups  with  irreducible  represen¬ 
tation  in  dimension  7  obtained  by  Wales  [50,  51,  52], 


VIII.  CONCLUSIONS 

The  impact  of  ancient  and  modern  mathematical  concepts  on  manipulation, 
transmission  and  storing  of  information  has  made  a  science  of  fine, 
intelligent  but  scattered  techniques. 

In  this  paper  we  reported  on  group  code  theory  as  an  application  of 
general  results  originated  from  the  ancient  geometry.  The  geometric 
view  provides  the  appropriate  framework  for  dealing  with  digital  signal 
processing,  signal  design,  vector  quantization  and  in  general  communi¬ 
cation  systems.  To  enhance  the  importance  of  this  concept  in  communica¬ 
tion  we  also  considered  the  combination  of  these  alphabets  with  block 
or  trellis  codes.  We  have  not  described  the  interesting  connection  of 
lattices,  group  codes  and  combined  modulation  and  coding,  this  beauti¬ 
ful  subject  is  thoroughly  developed  in  the  fundamental  book  [21]  by 
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Conway  and  S  Ioann. 

In  this  paper  no  essentially  new  results  were  proposed.  However  we  hope 
that  the  presentation  of  a  topic  which  is  earning  a  prominent  position 
with  increasing  applications  in  the  new  global  communication  system 
will  be  of  some  interest,  especially  to  young  researchers  who  are 
looking  for  fruitful  areas  of  research  with  high  scientific  content  and 
useful  applications. 

We  think  that  group  code  theory,  which  may  be  credited  of  a  long 
history  dated  back  to  ancient  regular  polyhedra,  is  a  good  example  of 
Feller's  conception  of  mathematics  156J.  In  fact  we  wish  to  conclude 
with  Feller's  words: 

"The  manner  in  which  mathematical  theories  are  applied  does 
not  depend  on  preconceived  ideas:  it  is  a  purposeful  techni¬ 
que  depending  on,  and  changing  with  experience". 
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Abstract 

Addition  chains  are  finite  increasing  sequences  of  positive  integers,  useful 
for  the  efficient  evaluation  of  powers  over  rings.  Many  features  of  addition 
chains  are  considered,  and  some  results  related  to  the  still  open  Scholi- 
Brauer  conjecture  are  presented. 


1  Introduction 

In  many  fields,  such  as  number  theory,  cryptography,  computer  science,  or  nu¬ 
merical  analysis,  an  efficient  computation  of 

xn  =  xx  . . . x  (1) 

is  often  required,  where  n  is  a  positive  integer  (n  €  Z)  and  x  can  belong  to  any  set 
Z  (usually  a  ring)  in  which  an  associative  multiplication  with  identity  is  defined. 
It  was  at  once  observed  that  the  computation  of  (l)  can  be  obtained  through  a 

sequence 

x,  x3,  x03 , . . . ,  x°' , . . . ,  x" 

where  each  clement  x“*  is  the  product  of  two  previous  ones.  It  turns  out  that  the 
nth-power  of  x  can  be  associated  to  the  sequence  of  integers 

1  =  ao  <  ai  <  aj  <  . .  .  <  ar  =  n  (2) 

with  the  property  that,  for  every  t,  a  couple  (j,  k)  can  be  found,  such  that 

a,  =  ay  +  a*,  »  >  j  >  k. 

‘This  work  was  financially  supported  in  part  by  the  United  States  Army  through  its  European 
Research  Office,  under  grant  n.  DAJ A45-86-C-0044. 


The  sequence  (2)  is  called  addition  chain  for  n.  Without  loss  of  generality,  the 
a, 's  arc  assumed  to  be  sorted  in  ascending  order,  and  with  no  duplications. 

The  problems  typical  of  the  evaluation  of  powers  have  been  thoroughly  dis¬ 
cussed  by  Knuth  ( 1]  and  by  Borodin  and  Munro  [2],  In  particular  |l)  reports  on 
many  problems  that  are  still  open  and  that  deserve  attention  both  as  research 
problems  and  for  their  importance  in  many  applications. 

Let  us  now  recall  some  examples  where  the  evaluation  of  powers  is  a  crucial 
point. 

•  First  of  all,  the  present  day  widely  discussed  public  key  cryptographic 
scheme  proposed  by  Rivest,  Shamir  and  Adleman  [3],  requires  the  search 
for  two  large  (several  hundreds  digits)  prime  numbers  p  and  q,  and  the 
evaluation  of  powers  in  Zw,  the  ring  of  the  residues  modulo  pq. 

•  As  a  second  example  let  us  consider  the  computation  of  inverses  in  finite 
field  GF(ij);  it  is  well  known  [4]  that  the  inverse  of  every  non-zero  element 
is  given  by 

„  -  1  _  o-2 

a  -  a 

and  in  many  applications  the  size  of  q  makes  this  computation  as  heavy  as 
those  required  in  the  previous  example. 

•  As  a  third  example,  let  us  consider  the  computation  of  roots  in  finite  fields. 
Given  a  6  GF (q),  let  k  be  the  root  index;  we  want  to  compute  the  expression 

.  i 

b  —  a‘ 

whenever  it  exists.  A  sufficient  condition  for  the  existence  is  that  k  has  an 
inverse  into  the  ring  Zq- 1,  i.e.  there  exists  an  integer  f(k)  such  that 

kf(k)  =  1  mod  q  -  1. 

Under  this  condition  we  have 

b  = 

If  k  has  not  an  inverse  in  Z,_i  then  more  tests  on  a  are  needed  to  know 
whether  its  fc-th  root  exists. 

•  As  a  final  example,  the  generation  of  pseudo-random  sequences 
xQ,  Zj,  . . . ,  xn, .  . .  by  the  purely  multiplicative  congruential  method,  using 
the  iterative  relation 

xn  +  l  =  axn  mod  rn, 

requires  multipliers  a  that  are  primitive  elements  in  Zm  in  order  to  generate 
sequences  with  maximum  period.  The  test  for  a  number  to  be  primitive 


may  consists  in  raising  the  number  being  tested  to  quantities  related  to  the 
factors  of  <p(m) 

In  many  interesting  cases  these  exponents  have  the  same  order  of  magnitude 
of  m,  hence  they  are  rather  sizable  for  non  trivial  periods.  Moreover  all 
the  operations  must  be  done  fully  exploiting  the  finite  size  registers  of  the 
underlying  machine  if  long  periodicity  is  desired  (see  [5]),  so  that  even  the 
simple  multiplication  can  be  fairly  costly. 


2  Power  Evaluations 

In  this  section  we  discuss  the  direct  and  simplest  approaches  to  power  evaluations, 
since  they  give  insight  to  more  tricky  theoretical  problems. 

Several  schemes  have  been  proposed  and  compared,  in  order  to  minimize  the 
efforts  (i.e.  number  of  multiplications)  for  evaluating  (1),  but  it  seems  that  none 
can  be  definitively  preferable  in  the  general  case.  The  choice  of  a  method  instead 
of  the  other  is  affected  by  a  number  of  constraints,  aims  or  available  resources, 
namely: 

•  the  order  of  magnitude  of  the  exponent  n; 

•  the  availability  of  storage  for  precomputed  tables; 

•  whether  the  situation  calls  for 

1.  independent  evaluations  of  the  power  (1); 

2.  evaluations  of  several  powers  of  the  same  base  z; 

3.  evaluations  of  several  powers  to  the  same  exponent  n. 

In  this  paper  we  do  not  pursuit  a  complete  comparison  of  all  these  different 
situations,  but  we  will  be  interested  mainly  on  the  minimum  number  of  products 
necessary  to  evaluate  (1).  In  other  words  we  will  restrict  our  attention  to  the 
study  of  the  function  /(n),  defined  as 

minimum  number  of  products  for  evaluating  the 

n-th  power  in  an  associative  ring.  '  ' 

At  a  first  sight  a  very  economical  evaluation  of  (l)  is  obtained  by  the  binary 
decomposition  of  the  exponent  n,  which  leads  to  a  number  of  multiplications 
upper  bounded  by  2[log2n].  The  same  decomposition  implies  the  simple  but 
tight  lower  bound  flog2n].  Most  considerations  about  the  evaluation  of  powers 
concern  the  estimation  of  tighter  upper  bounds. 

'•p  is  the  Ruler  lotient  (unction. 
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2.1  The  right  to  left  binary  method 

If  we  write 

t 

n  =  £6>2‘.  b,e{  0,1},  (4) 

t=0 

where  t  =  [log2  nj,  the  power  (1)  can  be  computed  as 

*-=ri  (*’')*'•  w 

>'= 0 

Given  that  the  6,’s  can  be  only  0  or  1,  raising  to  6,  is  straightforward.  We  shall 
call  this  approach  right  to  left  binary  method. 

In  (5)  t  multiplications  are  required  to  evaluate  the  powers 

**\  «  =  l,2,...,t  (6) 

and  one  more  multiplication  is  needed  for  every  non  zero  6,-,  »  <  t,  leading  to  a 
total  of 

[log2nJ  4-  i/(n)  -  1 

multiplications,  where  i/(n)  is  the  number  of  l's  in  the  binary  representation  of 
n.  The  storage  required  by  an  implementation  of  the  binary  method  (5)  can  be 
reduced  to  three  memory  cells:  one  to  hold  the  successive  powers  (6),  another  to 
hold  n  during  its  decomposition,  and  an  accumulator  for  the  result. 

The  right  to  left  binary  method  can  be  generalized  to  an  m-ary  method  in 
the  following  way  (6].  Let 

‘  =  ll°gmnJ  (7) 

and  consider  the  m-ary  decomposition  of  the  exponent  n 

t 

n  =  ^Td,m',  di  6  {0, 1,. . .  ,m  —  1}.  (8) 

i=0 

This  decomposition  can  be  rewritten  as 

n  =  +  i)  Y,  m‘>  (9) 

•G-L  <€^3  iGJrn-l 

where  Jy  denotes  the  set  of  indices  such  that  the  coefficients  di  in  (8)  are  equal 

to  j. 


The  right  to  left  m-ary  method  can  be  described  by  the  following  procedure. 


Step  1  of  procedure  (10)  requires  at  most  tl(m)  multiplications,  if  l(m)  is  the 
minimum  number  of  multiplications  for  raising  a  number  to  its  m-th  power: 
actually,  in  the  average,  not  all  the  terms  in  Step  1  will  be  necessary.  Raising  to 
j  in  Step  2  requires  l(j)  multiplications,  while  the  remaining  operations  in  Steps 
2  and  3  can  be  carried  out  with  no  more  than  t  —  1  multiplications.  The  total 
number  of  multiplications  is  bounded  by 

t/(m)  +  t- 1+53 /(i).  (ll) 

j=l 

2.2  The  left  to  right  binary  method 

Another  way  of  computing  (l)  is  to  rewrite  the  exponent  n  from  (4)  by  Horner’s 
rule  for  evaluating  polynomials 

n  —  bo  +  2(&i  +  2(&j  +  2(63  +  2(. . .  +  2b t)  . .  .))). 

We  shall  refer  to  this  approach  as  left  to  right  binary  method,  since  a  left  to  right 
scanning  of  n's  binary  representation  is  required. 

The  left  to  right  binary  method,  extended  to  an  m-ary  method,  is  described 
by  the  following  procedure,  based  upon  the  decomposition  (8). 

Step  1.  COMPUTE  AND  STORE  (12) 

_  _2  3  m  —  1  . 

X,  X  ,x  , . . .  ,x  , 

Step  2.  LET  «'  =  t; 

START  WITH  xd'\ 

Step  3.  REPEAT 

LET  «  =  t  -  1; 

RAISE  TO  THE  m-TH  POWER; 

IF  di  IS  NOT  0 

MULTIPLY  BY  xd'\ 

UNTIL  f  =  0; 


1  VO 


Table  1:  Upper  bounds  to  the  number  of  multiplications  in  computing  (1). 


base 

m 

right  to  left 
procedure  (10) 

left  to  right 
procedure  (12) 

2 

2[log2nJ  -  1 

2[log2nJ 

3 

3 (logs  "j 

3[log3  nj  +  1 

4 

3[log<  nj  +  2 

3[log4  nj  +  2 

5 

4[log6  nj  +  4 

4(log5nJ  +  3 

6 

4 [logs  nJ  +7 

4[log6n]  +  4 

7 

5[logT  nj  +  10 

5[log7nj  +5 

8 

4[logg  nj  +  14 

4[iog8  nj  +  6 

Note  that  a  certain  amount  of  storage  is  necessary  for  the  quantities  computed 
in  the  first  step  of  the  above  procedure;  moreover  the  representation  base  m  of 
n  must  be  available  in  a  left  to  right  order. 

Step  1  of  procedure  (12)  requires  at  most  m  —  2  multiplications;  actually  the 
xd'  do  not  need  to  be  computed  for  those  values  of  d,  not  present  in  the  decom¬ 
position  (8).  Each  iteration  of  Step  3  requires  at  most  l(m)  +  1  multiplications, 
the  +1  is  present  only  if  the  t-th  <k  is  not  0.  The  total  number  of  multiplications 
is  bounded  by 

m  -  2  +  t  (/(m)  +  1).  (13) 


2.3  Bounds  for  l(n) 

A  lot  of  work  concerns  the  search  of  tight  bounds  for  l(n).  By  comparing  the 
bounds  (11)  and  (13),  Table  1  can  be  built,  where  t  is  expressed  as  in  (7).  The 
order  of  magnitude  of  the  exponent  n  can  be  seen  to  affect  the  choice  of  the 
base  m;  the  optimal  m  increases  with  n.  As  an  example,  the  base  4  should  be 
preferred  to  the  base  2  whenever  n  >  128.  Moreover,  those  bases  that  are  powers 
of  2  appear  somehow  optimal,  since  they  lead  to  comparatively  small  coefficients 
for  [logmnJ  in  Table  1. 

Even  if  the  left  to  right  m-ary  method  seems  to  behave  better  for  large  bases 
m,  a  careful  inspection  of  the  bounds  (11)  and  (13)  shows  that  the  bound  (11)  is 
weaker,  since  Steps  1  and  2  of  procedure  (10)  are  open  to  several  optimizations 
both  in  the  case  of  few  and  the  case  of  many  terms  in  the  decomposition  (8). 

When  p(>  1)  powers  of  the  same  base  i  are  to  be  evaluated,  the  right  to 
left  method  becomes  advantageous.  In  this  case,  in  fact,  the  precomputations  in 
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Step  1  of  both  procedures  (10)  and  (12)  can  be  executed  only  once,  so  that  the 
bounds 

t  <(m)  +  p  -  1  +  T.  t{j)  j  (14) 

for  the  right  to  left  method,  and 

m  —  2  +  pt  (/(m)  +  1)  (15) 

for  the  left  to  right  method,  can  be  derived.  The  bound  (14)  is  tighter  than  (15), 
since  the  coefficient  of  p  is  smaller. 

It  is  known  that  the  bounds  presented  above  are  asymptotically  (for  large 
n’s)  equivalent.  Considering  the  left  to  right  binary  method,  we  can  write 

Clog2  n]  <  l(n)  <  [log2  nj  +  u(n)  -  1.  (16) 

Since  u(n)  <  flog2n],  and 

[log2nJ  +  [log 2  «]  <  2[logjnj  +  1, 

the  bounds  (16)  can  be  rewritten  as 

[logjnl  <  l(n)  <  2[logjnJ.  (17) 

Considering  the  m-ary  methods,  and  substituting  t  —  [logmnJ  in  (l  1) ,  the 
number  f(n)  of  multiplications  for  raising  to  n,  is  bounded  by  the  number  of 
multiplications  required  by  the  m-ary  method  which,  for  m  =  2*,  takes  the  form 


I(n)<  (l+i)  [logj  nj  +  2'. 


If  we  let  s  =  log2  logj  n  -  2Iog2  log2  log2  n,  (18)  becomes 

,(">s  (‘+bi^)'08’"+0(i^T^;)  (!9) 

This  result  is  due  to  Brauer  [7]  and  reported  by  Knuth  [1,  page  451,  Theorem 
D],  It  is  as  tight  as  possible  because  of  a  probabilistic  asymptotic  upper  bound 
to  l[n).  due  to  Erdos  [8],  which  asserts  that  the  probability  that 


Un)  <  l°gj  n  +  (1  -e) 


'  "  “  1  '  Vlog2log2ny 

is  definitively  less  than  1  for  any  c  >  0,  or,  equivalently,  that  there  always  are  n’s 
for  which  the  inequality  (20)  is  reversed. 

Also  the  lower  bound  [foj2nl  can  be  stressed;  in  fact  Schonhage  [9]  has  shown 
that  the  following  lower  bound  holds  for  every  n 


f(n)  >  log2  n  +  log2  u(n)  -2.12 


(i/(n)  >  4). 
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3  Addition  Chains 


Addition  chains  are  the  tool  for  solving  the  problem  of  computing  (1)  for  a  given 
n  with  the  minimum  number  of  multiplications.  Note  that  this  problem  is  only 
a  particular  case  of  problem  (1),  in  the  sense  that  nothing  is  said  about  the 
cost  of  deriving  /(n);  and  this  cost  can  exceed  by  far  the  cost  of  computing  (l) 
by  anyone  of  the  previously  quoted  methods.  Nevertheless  addition  chains  are 
useful  to  the  evaluation  of  powers  both  from  the  theoretical  standpoint  and  when 
several  quantities  need  to  be  raised  to  a  same  fixed  exponent. 

Addition  chains  have  been  formally  defined  in  the  introduction  as  sequences 
of  integers 

l  =  Uo<ai<a2<...<ar  =  n 

w  ith  the  property  that,  for  every  t,  a  couple  {j,k)  can  be  found,  such  that 

u;  =  aj  +  at,  i  >  j  >  k.  (21) 

It  turns  out  that  if  r  is  the  minimum  number  for  which  there  exists  an  addition 
chain  of  length  r  for  n,  then  this  addition  chain  is  a  solution  to  the  problem  stated 
at  the  beginning,  and  f(n)  =  r. 

It  is  convenient  to  define  two  special  classes  of  addition  chains.  A  star  chain 
is  defined  as  in  (21)  with  the  stronger  constraint  j  —  i  -  1.  An  l°-chain  is  an 
addition  chain  with  some  marked  elements;  the  condition  is  that  in  (21)  a,  is  the 
largest  marked  element  less  that  a,-.  It  can  be  shown  that 

‘(n)  <  <°( n)  <  /*(«),  (22) 

w  here  t  (n)  and  /  (n)  are  defined  in  a  way  similar  to  l(n),  respectively  for  /°-chains 
and  star  chains. 

A  lot  has  been  written  about  addition  chains  (see  [  1]  for  a  presentation  of  the 
main  results),  but  the  problem  of  finding  f(n)  is  not  completely  settled,  in  the 
sense  that  l[n)  is  not  known  for  all  n’s. 

Bounds  for  the  function  /(n)  were  shown  in  the  previous  Section. 

3.1  Functions  Related  To  Addition  Chains 

Many  interesting  functions  are  related  to  f(n);  here  we  consider  two  such  functions 
which  are  defined  as  follows. 

c(r)  =  minimum  integer  n  that  f(n)  =  r 

d(r)  =  number  of  solutions  in  n  to  the  equation  f(n)  —  r 
For  a  generic  n,  for  which  l(n)  =  r,  the  following  bounds  hold 

2r/2  <  c(r)  <  2r; 


(23) 

(24) 

(25) 
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the  upper  bound  is  straightforward  from  the  definition  (23)  of  c(r),  while  the 
lower  bound  comes  from  the  upper  bound  in  (17).  Using  the  results  shown  in 
Table  1,  the  lower  bound  can  be  tightened  to  the  form  2F  lr),  with  F(r)  =  ar  +  b; 
as  an  example,  exploiting  the  decomposition  to  the  base  3,  we  obtain  a  =  0.53 
and  6  =  0,  which  is  always  tighter  than  2r/2. 

Moreover,  the  same  lower  bound  can  be  significantly  improved  using  (19);  in 
fact,  after  some  algebraic  manipulations,  we  can  obtain  the  asymptotic  bounds 

2r“I^+0(^)  <  c(r)  <  2r.  (26) 

From  this  and  the  previous  relations  the  asymptotic  behavior  of  the  function  c(r) 
will  be 

c(r)  =  2'  +  o(2r). 

From  (25),  and  from  the  definition  of  d(r),  the  following  inequality  can  be 
stated 

d(r)<2r-c(r)  +  l; 

hence 

d[r)  +  c(r)  <  2r  +  1. 

It  is  likely  to  conjecture  that  d(r)  behaves  asymptotically  as  an  r-th  power  of  2: 

d(r)  =  O  (2d'r)  , 

where  dr  is  a  constant  close  to  1. 

The  known  values  of  c(r)  and  d(r)  for  small  values  of  r,  taken  from  Knuth  (lj, 
are  shown  in  Table  2  where,  for  sake  of  comparison,  some  of  the  bounds  derived 
in  this  Section  are  also  reported. 

4  The  Scholz-Brauer  Conjecture 

A  famous  problem  concerning  addition  chains  is  the  Scholz-Brauer  conjecture  [10], 
This  conjecture  refers  to  the  chains  for  2n  —  1,  which  are  of  special  interest,  since 
they  are  the  worst  case  for  the  binary  method  (their  binary  representation  is  a 
string  of  l’s).  Let  us  call  a  number  n  satisfying  the  inequality 

1(2"  -  1)  <  n  -  1  +  f(n),  (27) 

where  l{n)  is  defined  in  (3),  a  SB-number. 
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Table  2:  c(r),  d(r)  and  related  bounds. 


r 

“  2 r/J 

d(r) 

c{r) 

"Mr)) 

r 

i 

1.41 

2 

1 

2 

1 

2 

2 

2.83 

2 

2 

4 

2.82 

4 

3 

2 

8 

4 

5.66 

5 

3 

16 

5 

5.66 

8 

9 

11 

3 

32 

6 

8 

11.31 

15 

19 

3 

64 

7 

11.31 

16 

26 

29 

4 

128 

8 

16 

22.63 

44 

47 

5 

256 

9 

22.63 

32 

78 

71 

4 

512 

10 

32 

45.25 

136 

127 

7 

1024 

11 

45.25 

64 

246 

191 

7 

2048 

12 

64 

101.6 

432 

397 

5 

4096 

13 

90.5 

161.3 

772 

607 

7 

8192 

14 

128 

256 

1382 

1087 

7 

16384 

15 

181.0 

406.4 

2481 

1903 

9 

32768 

16 

256 

645.1 

3583 

11 

32768 

17 

362.0 

1024 

6271 

9 

32768 

18 

512 

1625 

11231 

11 

32768 

The  longstanding  Schob-Brauer  conjecture  states  that 

all  positive  integers  are  SB-numbers. 

In  the  following,  it  will  be  shown  that  (27)  holds  for  infinitely  many  n’s.  Let 
us  recall  some  of  the  properties  of  /(n),  reported  from  [1];  they  will  be  useful  in 

the  sequel. 


l(nm) 

<  /(n)  +  /(m); 

(28) 

l{  2°) 

=  a; 

(29) 

/( 2a  +  2b) 

=  a  +  1 

if  a  >  6  >  0; 

(30) 

I{2“  +  2b  +  2') 

—  d  "f  2 

ifa>6>c>0 

(31) 

(this  is  Theorem  B  in  [l]); 

a  +  2  <  /( 2°  +  2*  +  2C  +  2d)  <  a  +  3  if  a  >  b  >  c  >  d  >  0, 

where  n  =  2“  +  2b  +  2C  +  2d  is  said  to  be  special  (sec  |l,  p .449] )  if  the  lower  bound 
holds  with  equality  (this  is  called  Theorem  C  in  [l]); 

/°( T  -  1)  <  n  -  1  +  f°(n);  (32) 
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this  implies  that  the  Scholz-Brauer  conjecture  holds  for  /°-chains  (the  result,  due 
to  Hansen,  is  called  Theorem  G  in  (lj). 

Lemma  1  If  l( n)  =  l'(n)  then  n  is  a  SB-number. 

Proof  -  Straightforward  from  (22)  and  (32). 

□ 

Lemma  2  For  every  integers  a  and  k ,  the  following  inequality  holds 

'(irrr)  +  (33) 


Proof  -  It  is  direct  to  verify  (33)  for  a  —  0.  Now  let  us  suppose  (33)  is  satisfied 
for  a  -  1;  thus,  using  (28)  and  (29),  we  have 


<  k2a~]'  —  A:  +  a  —  1  +  k2a~i  +  1  < 

<  k2a  -  Jt  +  a. 


The  validity  of  (33)  for  every  a  follows  from  the  induction  principle. 

□ 

Note  that  the  recursive  argument  used  in  the  proof  above  also  defines,  in  case 
of  k=l,  an  addition  chain  which  contains  numbers  of  the  form 

2*(22'‘-l)  0  <  £  <  2h;  l  <  h  <  a  —  l.  (34) 

For  later  use,  we  state  this  point  as  a  Corollary. 

Corollary  1  There  exists  an  addition  chain  for  2J  —  1  of  length  2a  -  1  -fa,  such 
that  it  contains  the  numbers  (3f).  This  addition  chain  has  the  form 

•  •  - » (2*fc  —  1 ) , 2  (22h  -  l),...,2:*  (22"  -  l),(22H  +  ‘  -  1),  - . . 

.Vote  that 


17*3 


Theorem  1  For  every  positive  integer  n  the  inequality 


l( 2"  -  1)  <  n  -  2  +  i/(n)  +  [logj  nj 


(35) 


holds. 


Proof  -  By  decomposing  n  into  its  binary  representation  as  in  (4),  we  can  write 
2"  -  1  -  2^=°6,r(2M‘  -  1)  +  2^'=ot,2'(26,-,J"‘  -  1)  +  ...  +  (26°  -  1)  = 


=  ^2^!=ol'2,(2‘<J’  -  1).  (36) 

1=0 

Applying  Corollary  1  it  can  be  seen  that  all  the  i/(n)  terms  in  the  summation 
but  the  first  are  in  the  chain  for  (22*  -  1),  whose  length,  according  to  Lemma  2, 
is  bounded  by  21  -  1  +  t.  Since  the  first  factor  in  the  first  term  can  be  expressed 
as  2"~2  ,  it  accounts  for  at  most  n  —  2‘  multiplications.  Combining  these  two 
contributions  with  the  v[n)  —  1  additional  multiplications  required  by  the  i/(n) 
not  zero  terms  in  the  decomposition  (36),  the  Theorem  is  proved. 

□ 


Corollary  2  If  l(n)  =  [log2  nj  4-  u(n)  -  1  then  n  is  a  SB-number. 

Theorem  2  Every  n  such  that  i/(n)  is  not  greater  than  4  is  a  SB-number. 

Proof  -  The  proof  of  Theorem  2  is  given  separately  for  the  four  cases  i '(n)  = 

1,...  ,4. 

Case  i'(n)  =  1  -  Proved  in  Lemma  2  with  k  —  1. 

Case  i'(n)  —  2  -  It  must  be  shown  that,  for  every  integer  a  and  b  such  that 
a  >  b  >  0,  the  following  inequality  holds 

/( 22“  +  2‘  -  1)  <  2J  +  26  +  a. 

\Y  e  can  write 

22*  +  2*  _  i  _  9.3*(2:*  -  1)  +  2ji  -  !. 

from  Corollary  1  we  know  that  22  —  1  belongs  to  the  addition  chain  ending 
in  2*  -  1,  so  that,  using  Lemma  2  we  have 

/(22‘  +  2‘  -  1)  <  I(22“  -  1)  -  2*  +  I  <  2“  4-2*  +  a. 
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Case  ^(n;  3  -  It  must  Le  shown  that,  for  every  a,  6  and  c  such  that  a  >  b  > 

c  >  0,  the  following  inequality  holds 

f(2z*  +  J‘+J"  -  1)  <  2“  +  2*  +  2C  +  a  +  I. 

In  a  way  similar  to  the  case  i'(n)  =  2,  using  Corollary  1  and  Lemma  2,  the 
proof  stems  from  the  equality 

2J‘  +  J‘  +  J‘  _  !  =  22‘+2‘(22‘  -  1)  +  2r(2l*  -  1)  (  21'  -  I. 

Case  o(n)  =  4  -  Two  subcases  must  be  considered:  f(n)  —  a  +  3  and  l[n)  — 
u  +  2  In  the  first  case  the  proof  follows  from  Theorem  1.  In  the  second 
case  it  follows  from  Exercise  13  in  [l,  p.  463]  —  showing  that  rt  has  a  star 
chain  so  that  Lemma  1  applies  —  and  (32). 

□ 

4.1  Generalizing  the  Scholz-Braucr  Conjecture 

ihe  numbers  n  with  all  l’s  in  their  binary  representation  behave  much  better 
than  bound  (19).  In  fact  for  numbers  of  the  form  2"  —  1,  since  log2  n  >  i /(n)  —  1, 
the  inequality  (35)  can  be  rewritten  as 

f(2n  -  1)  <  n  -  1  +  c  log2  n,  (37) 

where  c  is  a  convenient  constant  1  <  c  <  2.  The  second  term  at  the  right  hand 
side  of  (20),  in  this  case,  has  the  form 

logt(2"  ~  1)  ^  n 

'°gj  log2(2"  -  1)  ~  log2  n 
and,  for  large  n’s,  the  inequality 

i  n 

c  log  2  n  <  - - 

log,  n 

holds. 

Improvements  on  the  upper  bound  for  f(n)  are  shown  by  numbers  which  have 
some  regular  patterns  in  their  binary  representation.  As  an  example  we  consider 
the  following  Theorem 

Theorem  3  For  every  positive  integer  M  of  the  form 
o-i  »-< 

M  +2‘^fc1  +  (2’  =  (2‘  -  1)t2'M,  (3c) 

1--0  1-0 

the  follow. ng  upper  bound  holds 

t(M)  <  s  -2-t  k(M,)  r  .-(»)+  |  log2  tj 

I  Vs 


(39) 


Proof  -  The  proof,  applying  Theorem  1,  is  straightforward. 

L 

Along  the  same  lines,  if  we  consider  numbers  of  the  form 

M  =  1  +  2*  +  2n  H - T  2(t '  l^fc , 

then  for  t  =  2'\  Lemma  2  shows  that 

l(M)  <  kt  -  k  +  l(t) 

and  tlie  following  Theorem  4  shows  that  this  inequality  also  holds  for  every  t  such 
that  t/(t)  <-  3. 

Theorem  4  For  (very  integers  k  and  t,  such  that  is(t)  is  not  greater  than  3,  the 
following  inequality  holds 

'  (1*-=t)  +  HO) 


Case  i/(()  =  3  -  Let  t  -  2A  +  2°  +  2C,  with  A>  U  >  C  >  0.  We  can  write 


,  / 2“  -  I  \  ,  f  -  1  -  1  2*’“  -  1 

1  2*^7  = '  2 1  ’  ’iM  + 2 


In  a  way  similar  to  t  le  case  u(t)  =  2,  using  (28)  and  (30),  and  Lemma  2, 
we  can  obtain 

/2*(2a  +  2l+2c')  _  i\ 

1  [  - - oTTl - ’  )  -  k(2*  +  2B  +  2C)  -  k  +  A  +  2. 


We  can  now  propose  a  generalization  of  the  Scholz-Brauer  conjecture  in  the 
form 

for  every’  k  and  for  every  n  the  following  inequality 


2tn  -  1 


— —  I  <  kn  -  k  +  i(n) 


holds. 


Note  that,  for  k  =  1,  it  reduces  to  the  original  conjecture. 

5  Conclusions 

Knuth  reports  that  1  <  n  <  18  and  sporadic  20,  24  and  32  are  SB-numbers  with 
equality  satisfied;  moreover  he  has  shown  by  computer  search  that  f(n)  =  P(n) 
for  all  integers  less  than  12509.  As  a  consequence  of  Lemma  1,  12509  can  be 
assumed  to  be  the  first  non  SB-number. 

An  infinity  of  SB-numbers  exists  but  it  is  an  open  question  to  prove  the 
Scholz-Brauer  conjecture  either  in  the  generalized  form  or  not. 

Finally,  as  a  consequence  of  the  results  presented  in  this  paper,  an  even  more 
interesting  open  question  seems  to  be  find  the  smallest  value  of  c  such  that  (ST) 
holds  for  every  n. 
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Abstract 

A  representation  of  the  Kerdock  code  K (m)  is  given  that  allows  in¬ 
stantaneous  encoding  and  the  use  of  different  complete  decoding  strate¬ 
gies.  Applications  to  error  correction  and  to  vector  quantisation  are 
described.  The  particularly  interesting  code  K  (4)  is  thoroughly  ana¬ 
lysed  and  the  associated  bit  error  rate  on  the  binary  symmetric  channel 
is  found  in  closed  form. 


1  Introduction 

Kerdock  codes  K  (m)  are  nonlinear  codes  having  many  interesting  proper¬ 
ties,  such  as  high  error  correcting  capabilities,  high  symmetry  and  beautiful 
descriptions.  They  may  be  viewed  in  some  way  as  dual  codes  of  Preparata 
codes  P(m),  another  noteworthy  class  of  nonlinear  codes.  The  code  K{ 4)  is 
very  interesting  because  besides  the  relatively  high  rate  1/2,  it  coincides  with 
the  Preparata  coc. ;  P{ 4),  so  that  it  looks  like  a  sort  of  self-dual  nonlinear 
code. 

The  most  obvious  application  of  Kerdock  codes  is  their  use  as  chan¬ 
nel  codes  in  communication  systems.  K( 4)  may  also  be  viewed  as  the 
Nordstrom- Robinson  code  >/i6i  and  used  as  a  vector  quantizer  for  encoding 
random  waveforms  such  as  in  the  case  of  speech  Linear  Predictive  Coding 
(LPC)  at  the  rate  of  1/2  bit  per  sample  [3j.  In  a  similar  way  Kerdock  codes 

tThi*  paper  was  presented  at  IEEE  International  Symposium  on  Information  Theory, 
Kobe,  JAPAN,  June  1988. 

'This  work  was  financially  supported  in  part  by  the  United  States  Army  through  its 
European  Research  Office,  under  gTant  n.  DAJA45-86-C-0044. 
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Table  1:  Weight  distribution  for  K(m). 

allow  the  decoding  from  data  produced  by  soft  demodulation.  In  both  vector 
quantization  and  soft  data  decoding  the  problem  is  to  minimize  an  objective 
function,  which  most  frequently  is  taken  to  be  the  squared-error  distortion. 

In  Section  2  a  systematic  representation  of  /C(m)  is  given  that  allows 
instantaneous  encoding  and  the  application  of  different  strategies  for  a  com¬ 
plete  decoding,  as  it  will  be  described  in  Section  3.  The  application  of  K  (4) 
to  vector  quantization  will  also  be  described  in  Section  3.  A  short  analy¬ 
sis  of  the  computational  complexity  pertaining  to  the  above  applications  of 
K  (m)  will  be  given  in  Appendix  A.  Finally,  Section  4  reports  some  results 
on  the  performance  evaluation  of  K  (4)  used  as  a  channel  code  on  the  Binary 
Symmetric  Channel  (BSC). 

2  A  representation  for  K(m) 

In  this  Section  we  briefly  recall  the  formal  definition  of  Kerdock  codes  in 
order  to  introduce  a  systematic  encoding  scheme.  We  also  collect  some  of 
its  general  properties  for  easy  reference. 

The  Kerdock  code  K{m),  m  even,  is  a  nonlinear  code  consisting  of  the 
Reed-Muller  code  of  parameters  (2m, m  +  l,2m/1)  and  2m_1  —  1  coset3  of 
je(I,m)  in  J?(2,m).  K(m)  is  also  denoted  by  [2m,2Jm,2m"1  -  2m/*-i]. 
Important  features  of  any  code  are  the  weight  and  the  distance  distributions. 
The  weight  distribution  of  a  (n,  M,d]  code  is  the  set  {A,}JL0,  where  A,- 
denotes  the  number  of  codewords  of  weight  i,  while  the  distance  distribution 
is  the  set  {  B,  }"=0 ,  where  M  B,  is  the  number  of  ordered  pairs  of  codewords 
such  that  the  distance  between  them  is  i.  Linear  codes  have  B,-  =  A,-  and 
the  same  property  is  shown  by  Kerdock  codes.  The  weight  and  distance 
distribution  of  K{m),  taken  from  [1],  is  given  in  Table  1. 

Let  eje  be  a  vector  of  £(1,  m).  For  later  use  it  is  convenient  to  interpret 


cjj  according  to  the  following  decomposition 

c*  =  (  *  |  zx  |  z2  ).  (1) 

< —  m  +  1  — ►  < —  m  —  1  — »  * —  2m  —  2m  — * 

Let  t»,  be  a  coset  leader  that  performs  a  translation  of  £(l,m)  to  generate 
a  codeword  c  of  K(m),  i.e. 

«  =  u?,-  +  c*; 

the  code  K(m)  is  the  union  of  disjoint  cosets  of  £(l,m),  written  as  follows 

AT(m)  =  [tui  +  £(l,m)]  U  [u>2  +  £(l,m)]U ...  U  [to2«  +  £(l,m)].  (2) 

The  definition  of  K(m )  strongly  lies  on  the  choice  of  the  *  = 

1, . . . ,  2m_1,  which  may  be  obtained  by  means  of  primitive  idempotents  for 
length  2m_1  —  1,  or  by  using  simplectic  forms  to  define  a  convenient  set  of 
boolean  functions.  A  very  simple  construction  of  /f(4)  is  given  in  [2]  were 
the  cosets  leaders  are  defined  through  simplectic  forms,  very  easy  to  obtain, 
on  four  variables. 

For  easy  reference,  it  is  convenient  to  introduce  a  binary  matrix  W^m\ 
built  with  the  coset  leaders  tn,  written  by  rows,  an  example  of  which  will  be 
given  in  (5). 

We  use  a  systematic  £(l,m)  code  and  its  translates,  given  by  coset 
leaders  tff,  of  a  special  form,  to  generate  K(m).  The  following  two  Lemmas 
are  the  formed  support  to  this  representation. 

Lemma  1  In  every  coset  of  a  systematic  linear  (n,  k,  d)  code  there  exists 
exactly  one  word  with  k  consecutive  zeros  in  information  positions. 

Proof  -  Let  (*  |  a)  denote  a  codeword  of  a  systematic  (n,k,d)  code,  where 
*  is  the  subvector  of  information  bits.  This  subvector  ranges  over  the  whole 
set  of  possible  2*  bit  patterns.  Given  a  coset  leader  (z  |  y),  there  exists  one 
and  only  one  code  vector  (z  |  z)  such  that 

(z  |  z)  +  (z  |  y)  =  (o  |  z  +  y) 

is  the  unique  coset’s  element  with  k  zeros  in  information  positions. 

□ 
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Now  let  *  =  ( 


)  be  a  vector  of  2m  information  bits. 


*i  |  *2 

* —  m+J  — »  < —  m  —  1  — ► 

Lemma  2  /n  tAe  matrix  there  exists  a  submatrix  made  of  m  —  1 

columns  whose  rows  are  the  2m_1  different  binary  sequences  ofm—  1  6*£«. 

Proof  -  It  is  known  [9]  that  Kerdock  codes  can  be  viewed  as  systematic 
codes.  This  means  that  all  the  different  patterns  of  2m  bits  must  appear  in 
the  2m  information  positions  of  the  2im  codewords.  As  already  mentioned, 
these  codewords  can  be  viewed  as  translations  of  £(l,m)  due  to  the  2m_1 
coset  leaders  to,-.  These  leaders,  by  Lemma  1,  can  be  chosen  to  have  m  +  1 
zeros  in  the  m  +  1  information  positions  of  £(l,m): 

to,-  =  (o  |  z,  |  yi), 

and  the  codewords  of  J2(l,m)  can  be  taken  in  systematic  form: 

CR  =  (*1  I  *1  I  * 2). 

Therefore  every  codeword  of  K  (m)  will  be  of  the  form 

(*l  |  *1  +  x{  |  z2  +  tf,). 

According  to  the  observation  above,  all  the  subvectors  (t‘i  |  Z\  +  z,-)  must 
be  different  for  different  pairs  *1  and  i.  In  particular 

(*1  |  Zi  +  Zi)  ^  (*!  |  z2  +  xy) 

for  every  i  ^  j.  That  is  z,-  ^  Zy  for  every  «  ^  j  or,  in  other  words,  all  the 
subvectors  z,  in  the  2Tn_1  vectors  tCi’s  are  distinct  and  range  over  all  the 
different  binary  sequences  of  m  —  1  bits. 

□ 

As  a  consequence  of  Lemmas  1  and  2,  to,-  may  be  taken  of  the  form 

to.  =  (o  i  *2  I  y), 
so  that  the  codewords  will  result  of  the  form 

«  =  (  *1  I  *i+«2  I  ^2  +  y  )•  (3) 

* —  m  +  1  — ►  < —  m  —  1  — ►  ♦ —  2m  —  2m  — * 

As  noted  in  [9]  the  Kerdock  code  could  also  be  viewed  as  a  strictly 
systematic  code  at  the  cost  of  loosing  the  orderly  representation  reported 
above. 
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Table  2:  Weight  distribution  of  ML  correctable  error  patterns  for  K  (4). 


2.1  Application  to  >C(4) 

The  above  results  applied  to  K  (4)  let  the  generating  matrix  of  the  underlying 
£(1,4)  code  be  written  in  the  form 


G%  —  (Gi  |  Gt  |  Gj)  — 


/  10000  111  01101001  \ 
01000  110  11010101 
00100  101  10110011  , 
00010  011  10001111 
V  00001  000  01111111 


(4) 


and  correspondent^  the  matrix  of  coset  leaders  in  the  form 


/ 


wW  = 


V 


00000  000  00000000  \ 
00000  001  11100101 
00000  010  10111001 
00000  100  11001011 
00000  011  01010011 
00000  101  00011101 
00000  110  00100111 
00000  111  11111110 


(5) 


A  very  interesting  feature  is  the  fact  that  suitable  translations  of  K{ 4) 
cover  the  whole  vector  space  GF(2)16  without  overlapping.  In  fact  256 
correctable  error  patterns  have  been  found  by  computer  search  such  that 
the  translates  £,•  +  K( 4)  do  not  overlap  and  cover  the  whole  space  GF(2)16. 
The  weight  distribution  {L,  }?“0  of  the  correctable  error  patterns  is  reported 
in  Table  2,  where  L,  denotes  the  number  of  error  patterns  of  weight ».  This 
property  shows  that  Standard  Array  decoding  is  possible  for  K(4),  as  it  will 
be  described  in  the  Section  3. 

From  Table  2  the  fact  that  K  (4)  is  not  quasi-perfect  can  also  be  observed. 
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The  property  reported  above  for  K(4),  can  be  conjectured  to  hold  for  all 
Kerdock  codes  K(m): 

suitable  translations  of  K(m)  cover  the  vector  space  GF{ 2)2™ 
without  overlapping. 

2.2  Encoding 

The  representation  (3)  shows  that  instantaneous  encoding  is  possible.  In 
fact  the  first  m  + 1  bits  can  be  transmitted  while  they  enter  the  encoder.  At 
the  (m  +  I)-th  bit  the  remaining  parity  check  bits  for  the  £(l,m)  code,  i.e. 
the  vectors  Z\  and  zj,  can  be  computed.  As  the  remaining  m—  1  information 
bits  enter  the  encoder,  they  are  summed  with  the  entries  of  vector  Zi  and 
transmitted.  At  that  point  the  coset  leader  to,-  (hence  the  vector  y)  is  known, 
such  that  the  remaining  parity  check  bits  can  be  computed  as  z-i  +  y  and 
transmitted. 

3  Decoding  and  Quantization 

In  this  Section  some  procedures  for  decoding  Kerdock  codes  and  for  per¬ 
forming  the  vector  quantization  based  on  Kerdock  codes  are  described.  As 
a  consequence  of  the  representation  introduced  in  Section  2,  the  problem  of 
decoding  Kerdock  codes  may  be  formulated  as  follows: 

given  a  received  word  r,  find  the  pair  [w{,  crJ  made  of  a  coset 
leader  and  a  Reed-Muller  codeword,  such  that  the  decoded 
codeword  e  =  w- -f-  e  -  satisfies  the  chosen  decoding  criterion. 

Several  decision  rules  may  be  considered,  their  main  difference  lying  in 
the  manner  adopted  to  resolve  ties  whenever  more  than  errors  are 

detected,  since  these  codes  we  not  perfect.  In  particular  two  strategies 
deserve  special  interest:  the  Maximum  Likelihood  (ML)  and  the  Minimum 
Correction  (MC)  rules.  They  we  defined  as  follows. 

Maximum  Likelihood  rule:  r  is  decoded  as  the  codeword  i  that  max¬ 
imizes  the  conditional  probability  p{e  |  r}.  On  BSC  this  rule  coincides 
with  the  minimum  distance  decoding,  i.e.  r  is  decoded  as  the  codeword  e 
corresponding  to  the  minimum  distance. 

Minimum  Correction  rule:  r  is  decoded  as  the  codeword  at  the  min¬ 
imum  distance  if  the  distance  is  less  or  equal  to  otherwise  the  in¬ 

formation  bits  are  extracted  from  the  received  word  without  any  correction 
attempt. 
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Four  algorithms  for  decoding  Kerdock  codes  are  described  in  the  fol¬ 
lowing,  based  on  the  arithmetic  in  GF(2).  They  show  increasing  memory 
requirements  and  decreasing  computational  complexity.  Let  us  remind  that 
the  Hamming  weight  wt(z)  of  a  vector  z  is  the  number  of  its  nonzero  com¬ 
ponents  and  let  us  introduce  the  vector  r,  decomposed  as  in  (1) 

r  =  (ri  |  r2  |  r3), 
that  will  be  referred  to  as  the  received  vector. 

Algorithm  1  (Minimum  distance  decoding)  -  The  codewords  e,-  are 
stored  in  a  table.  For  every  received  vector  r,  the  22m  Hamming  dis¬ 
tances  y,  =  wt(r  —  Cj),  *  =  1, . . . ,  22m,  are  computed  and  the  minimum 
y ,•  is  found.  Ties  are  resolved  by  random  equiprobable  choices.  The 
decoded  bits  t  are  recovered  from  the  corresponding  codeword  e,-. 

Algorithm  2  (Syndrome  decoding:  ML  rule)  -  Hu,  the  parity  check 
matrix  of  £(l,m),  the  vectors  z,  —  HRtBi,  i  —  l,...,2m_1,  and  the 
vectors  uy  =  H%tj,  j  —  1, ...)22"*  ,  are  stored.  For  every  r  the 

syndrome  t  =  H%r  is  computed.  The  pair  [uy,z<]  that  sums  to  *  is 
then  found  and  tj  is  recovered  from  uy.  The  received  vector  r  is  finally 
decoded  as  e  =  r  +  tj  and  the  information  bits  *  are  recovered  from  c. 

This  algorithm  must  be  restricted  to  K( 4),  as  it  takes  advantage  from 
the  fact  that  Standard  Array  decoding  is  possible  (see  Section  2.1).  - 
As  already  mentioned,  we  conjecture  that  it  may  also  be  applied  to 
decode  K(m),  for  every  m  even. 

Algorithm  3  (Syndrome  decoding:  MC  rule)  -  This  is  the  previous 
scheme  adapted  to  the  MC  decoding  rule. 

Hr,  parity  check  matrix  of  £(l,m),  a  table  T  of  the  syndrome  vec¬ 
tors  Hge  associated  to  error  patterns  e  of  weight  not  greater  than 
j^2  -  2  t1  J ,  and  G%,  generating  matrix  of  £(l,m),  are  stored.  For 
every  r  the  syndrome  »  =  Hur  \s  computed.  The  error  pattern  e 
is  searched  in  T,  using  the  entry  *.  If  it  is  found,  r  is  decoded  as 
c  =  r  +  i  and  the  information  bits  i  are  recovered  from  e.  Otherwise 
the  first  m  +  1  information  bits  tj  =  T\  are  taken  unmodified,  the 
vector  a  =  G\  *i  is  computed  and  the  remaining  m  —  1  information 
bits  are  obtained  as  *j  =  r2  +  a. 
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Algorithm  4  (Tabular  decoding)  -  A  table  7i  of  the  indices  j'e  asso¬ 
ciated  to  the  error  patterns  Ij'a  for  every  r  G  GF(2)5*  and  a  table 
7j  of  the  error  patterns  Lfi s,  j  —  1, . . .  ,2m_1  Me  stored.  For  every  r 
the  index  j  is  obtained  using  Table  T\.  The  error  pattern  Lj  is  read 
in  Table  7j  using  j,  in  order  to  compute  c  =  r  +  Lj.  The  information 
bits  i  are  finally  recovered  from  c. 

Vector  quantization  is  a  field  where  K  (4)  has  found  a  valuable  applicar 
tion.  Let  us  formally  recall  the  vector  quantization  problem  with  minimum 
squared-error  distortion.  Let  *  =  (*i,. ..,**)  G  Rn  be  the  input  to  the  vec¬ 
tor  quantizer  and  let  {e,-}^  be  the  set  of  codewords.  The  problem  may  oe 
formulated  as  follows: 

find  the  codeword  e,-  among  the  N  possible  ones  which  min¬ 
imizes  the  squared  error 

||  x  -  «,•  || J=  xTx  -  2 xTCi  +  cja,  (6) 

where  if  cj c,-  is  independent  of  i  then  the  minimum  distance 
is  achieved  by  the  codeword  e,  yielding  the  largest  scalar  prod¬ 
uct  y i  =  zTe,. 

The  most  efficient  algorithms  known  today  for  performing  the  vector 
quantization  using  /C(4)  Me  based  on  the  HadamMd  Transform  (HT),  whose 
definition,  for  easy  reference,  will  now  be  recalled. 

Let  Hn  denote  an  HadamMd  matrix  in  Sylvester  form,  which  is  a  n  x  n 
matrix  of  +l’s  and  —  l’s  with  the  property  that  the  scalM  product  of  any 
two  distinct  rows  is  0.  Thus  Hn  must  satisfy  the  relation 

HnHl  =  nl, 

where  I  is  the  n  x  n  identity  matrix. 

An  n-dimensional  column  vector  y  is  called  the  HT  of  the  vector  x  if  it 
is  obtained  multiplying  the  vector  *  by  an  HadamMd  matrix,  i.e. 

V  =  Hnx. 

In  this  context  we  shall  consider  the  i-th  binary  codeword  of  K  (m)  as  a 
vector  of  +l’s  and  —  l’s,  with  +l’s  replacing  0’s  and  —  l’s  replacing  l’s.  It 
is  easy  to  see  that  this  will  replace  the  usual  vector  sum  over  GF(2)  with 
the  dot  component-wise  product  of  integer  vectors,  hereafter  denoted  ©. 
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The  scalar  and  dot  products  are  compatible  in  the  sense  that  the  following 
property  holds 

xt(vQz)  =  (xQy)Tz.  (7) 

Vector  quantization  using  AT(m)  requires  ,  by  direct  application  of  (6),  the 
computation  of  22rr'  scalar  products 

yi  =  xTei,  «  =  l,...,22m,  (8) 

and  22m  -  1  comparisons  to  search  the  minimum  y,-. 

Applying  the  property  (7),  y,-  may  be  computed  as 

y,  =  zT(u>y  ©e*)  =  {xQwj)TeR.  (9) 

As  noted  in  [1,2,3],  the  2m+1  codewords  of  J2(l,m)  can  be  grouped  to  firm 
a  Hadamard  matrix  and  its  negative  —Him.  Therefore  the  y^’s  can  be 
computed  as  2m-dimensional  HT’s  of  the  2m_1  vectors  (zQ  toy).  Moreover 
only  2Zm_1  comparisons  are  necessary  to  find  the  maximum  scalar  product; 
the  search  can  be  limited  to  the  absolute  values  |  iTCj  |  and  the  proper 
codeword  can  then  be  chosen  according  to  the  sign  of  zTe,-. 

The  above  observations  can  be  also  applied  to  the  minimum  distance 
decoding  of  soft  data.  The  computation  of  Algorithm  1  may  be  performed 
by  executing  the  HT’s  of  the  received  vector  r  and  the  companion  vectors 
r  ©  ®y»  3  =  2, . . . ,  2m~x.  In  the  following,  two  algorithms  that  implement 
the  decoding  along  these  lines  are  described. 

Algorithm  5  (HT  decoding:  ML  rule)  -  The  matrix  vV(m)  of  the 
coset  leaders  and  the  Hadamard  matrix  Hjm  in  Sylvester  form  are 
stored.  For  every  r,  the  22m  scalar  products  y ,  =  rTe,,  *  =  1, . . . , 22m, 
are  computed  by  performing  the  2m  HT’s  #2- (r  ©  to,-)  and  —Hjm{r  © 
to,),  t  =  l,...  ,2m.  The  maximum  y,-  is  then  found,  resolving  ties  by 
random  equiprobable  choices.  The  received  vector  r  is  finally  decoded 
as  the  codeword  c,-.  from  which  the  information  bits  »  are  recovered. 

Algorithm  6  (HT  decoding:  MC  rule)  -  The  matrices  Him 

and  Gr,  generating  matrix  of  of  £(l,m),  are  stored.  For  every  re¬ 
ceived  vector  r,  the  22m  scalar  products  y j  =  rTe,-,  i  =  l,...,22m,  are 
computed  by  performing  the  2m  HT’s,  ^-(rQto,-)  and  -Hj».(r©to,), 
«  =  1,...,  2m.  The  maximum  y,  is  then  searched.  If  there  are  no  ties, 
r  is  decoded  as  the  codeword  c,,  from  which  the  information  bits  t  are 
recovered.  Otherwise  the  first  m+ 1  information  bits  are  taken  unmod¬ 
ified  (t'i  =  rj),  the  vector  a  =  G\  q  is  computed  and  the  remaining 
m  -  1  information  bits  are  taken  as  i2  =  r2  +  d. 
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Direct  application  Direct  FHT  Proposed  scheme 


2m[3  +  (m-  2)2m~1J 


1,920  512  j  304 

129,024  122,88 

8,355,840  262,144 


Table  3:  Complexity  figures  for  vector  quantization  with  K(m). 

An  efficient  method  for  computing  the  HT’s  required  by  the  above  Algo¬ 
rithms  is  reported  in  Appendix  A,  together  with  computational  complexity 
remarks.  The  resulting  complexity  figures  are  summarized  in  Table  3. 

4  Bit  and  Word  Error  Probabilities  for  >C(4) 

In  general  it  is  very  hard  to  compute  either  bit  error  rate  or  word  error 
rate  for  nonlinear  codes.  For  /C(4),  however,  such  a  computation  is  feasible 
because  its  structure  is  very  similar  to  that  of  linear  codes.  In  fact,  as 
previously  observed,  the  decoding  can  be  oiganized  as  a  Standard  Array, 
since  the  translates  of  K  (4)  by  the  correctable  error  patterns  do  not  overlap 
and  cover  the  whole  vector  space  of  dimension  16  over  GF(2).  In  this  case 
(see  [1,6])  the  bit  error  probability  pt  and  word  e’ror  probability  pw  after 
complete  decoding  on  the  BSC  can  be  expressed  as  a  polynomial  in  the  raw 
bit  error  rate  p  of  the  BSC: 


1  16 


where  the  coefficients  are  reported  in  Table  4.  They  have  been  computed 
from  the  Standard  Array  according  to  a  counting  scheme  proposed  in  [l] . 
Interesting  are  the  asymptotic  expressions  p»  x  iy^p8  and  pt  x  A^pp8 
for  the  ML  and  MC  decoding  respectively,  as  p  tends  to  zero.  From  these 
relations  it  follows  that,  at  least  asymptotically,  the  MC  rule  is  superior  to 
the  ML  rule  as  far  as  the  bit  error  rate  is  concerned.  On  the  other  hand, 
as  expected,  the  word  error  probability,  whose  asymptotic  expressions  are 
pw  x  448p3  and  pw  x  504p3,  shows  a  better  asymptotic  expression  in  the 
ML  case. 
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5  Conclusions 


In  this  paper  we  have  dealt  with  many  different  properties  of  Kerdock  codes. 

A  description  of  Kerdock  codes  that  allows  instantaneous  encoding  was 
given.  This  approach  leads  to  the  application  of  two  different  decoding 
strategies,  i.e.  the  well  known  Maximum  Likelihood  criterion  and  another 
one  that  we  have  called  Minimum  Correction  rule.  Referring  to  K(i)  it 
has  been  shown  that  a  Standard  Array  can  be  built  by  translating  the  set 
of  codewords  without  overlapping.  From  the  inspection  of  this  Standard 
Array  it  turns  out  that  K(4)  is  not  quasi-perfect  (see  also  Table  2).  The 
same  Standard  Array  allows  the  computation  of  the  bit  error  rate  for  K  (4) 
on  the  binary  symmetric  channel,  with  respect  to  both  ML  and  MC  decoding 
strategies:  in  this  particular  case  MC  is  asymptotically  superior. 

Finally  it  has  been  analized  a  scheme  suitable  both  for  decoding  and 
for  vector  quantization  based  on  K(m).  Based  upon  Hadamard  Transforms, 
it  shows  very  low  computational  complexity  figures.  Table  3  compares  the 
number  of  sums  required  by  the  proposed  scheme  with  the  standard  FHT 
and  the  direct  application  of  Algorithm  1  above. 
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A  Complexity  of  soft  data  decoding  and  vector 
quantization  based  upon  K(m) 

Every  dissertation  on  decoding  complexity  suffers  the  lacking  of  suitable 
measures  of  complexity.  However  for  most  practical  applications  the  number 
of  arithmetical  operations  (in  any  field),  the  number  of  logical  operations 
and  the  amount  of  storage  required  can  be  taken  as  meaningful  figures.  In 
the  following  we  estimate  the  complexity,  in  terms  of  number  of  arithmetic 
sums,  for  decoding  and  vector  quantizing  based  upon  K(m). 

In  [3 j ,  by  referring  to  a  definition  of  K( 4)  as  i/i6,  it  is  shown  that  in  the 
vector  quantization  problem,  the  nearest  neighbor  codeword  can  be  found 
with  30 '  additions  and  128  comparisons.  By  using  similar  arguments,  based 
on  a  variant  of  the  Fast  Hadamard  Transform  (FHT),  and  using  our  rep¬ 
resentation  of  Kerdock  codes,  we  will  introduce  a  generalization  to  K(m), 
that  shows  the  same  complexity  figure  in  the  case  m  =  4. 

It  was  already  shown  in  Section  3  that  the  vector  quantization  problem 
can  be  solved  with  the  computation  of  2m_1  Hadamard  transforms  of  di¬ 
mension  2m.  Tii'1  camplexity  of  this  computation  stems  from  the  following 
observations,  motivated  in  [1,2,3]. 


1.  The  HT  of  dimension  2m  may  be  computed  by  evaluating  HT’s  of 
smalior  dimension.  In  fact  the  matrix  i/jm  may  always  be  written  as 

_  /  H2m- 1  A 

2m  ~  \jy2 _ _ 

This  means  that  a  HT  of  dimension  2m  may  be  computed  by  perform¬ 
ing  two  HT’s  of  dimension  2m_1  and  operating  2-2m_1  sums,  and  every 
HT  of  dimension  2m~1  may  by  obtained  from  two  HT’s  of  dimension 
2m_2  and  2  •  2m_2  sums,  and  so  on. 

This  observation  allows  a  decomposition  of  the  Sylvester-type  matrix 
//j-.  in  terms  of  i/4  submat.  .ces,  where  the  matrix  i/4  has  the  structure 
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As  an  example 


#16  = 


#4  H  4  #4  #4 

#4  -#4  #4  ~#4 

#4  #4  -#4  -#4 

#4  -#4  -#4  #4 


2.  The  vectors  z  and  to,  can  be  partitioned  into  subvectors  of  dimension  4 

*  =  (*i  |  *2  I  •••  |  z2m-i) 


W,  =  (ttfK  I  «72.  I - |  2«) > 

and  their  dot  product  may  be  performed  independently  in  each  single 
part 

zQwi  =  (z i  ©  iol  |  z3  ©  tn3f  |  . . .  |  z4  ©  in 4,). 

Note  that  the  action  of  toJt-  on  Zj  is  to  change  the  sign  of  some  entry. 

3.  The  HT’s  of  (01,02,03,04)  and  (01,02,03, -04)  require  12  sums 

(oi  +  o2)  +  (03  +  o4) 

(ai  +  02)  +  (03  —  04) 

(oi  +  o2)  -  (03  +  04) 

(oi+o2)  -  (03-04) 

(ai-a2)  +  (03+04) 

(ai  -  o2)  +  [as  -  04) 

(ai  -  a2)  -  (03  +  04) 

(ai  -  02)  -  (03  -  a4). 

4.  Given  aT  =  (ai, 02, 03, 04),  the  HT  of  a  vector  derived  from  a  by  an 
even  number  of  the  sign  changes  can  be  obtained  from  the  HT  of  a  by 
simple  permutations  and  sign  changes.  If  we  call  bT  =  (61,62,63,64) 
the  HT  of  a,  we  have 

(  Oi,  02,  o3,  04)  <-*  (  61,  63,  63,  64) 

(-01,-02,-03,-04)  «->  (-61,-62,-63,-64) 

(-01,-02,  03,  04)  «-»  (—63,-64,-61,-62) 

(  ai ,  02,-03,-04)  «-+  (  63,  64,  61,  62) 
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(-Oi,  a2)-a3,  a4)  <-►  (-62,-61,-64,-63) 

(  ai>  — <»2»  as,  ~a*)  «-»  (  62,  61,  64,  63) 

(-ax,  a2,  03,-04)  *-*  (-64,-63,-62,-61) 

(  ai,  -a2,  — a3,  a4)  <-»  (  64,  63,  62,  61). 

5.  Due  to  the  form  of  the  matrix  W^m\  for  each  block  of  4  columns  the 
computation  of  at  least  a  couple  of  HT’s  as  in  Point  3  above  must 
be  clone.  No  other  transforms  are  required  due  to  the  observation  in 
Point  4.  The  total  number  of  sums  is  therefore 

o  m 

12  —  =  3  •  2m. 

4 

6.  Due  to  Point  1  above,  the  combination  of  subtransforms  to  produce 
#2m  (*  0  »,■)  requires  the  following  number  of  sums 

2  -  2m_1  +  22  2m~2  +  . . .  +  2m_2(2J  =  4)  =  (m  -  2)  2m. 

7.  The  number  of  4-dimensional  HT’s  to  be  computed  is  2"‘-2,  so  that 
the  total  number  of  sums  is 

3  •  2m  +  2m-1(m  -  2)  2m  =  2m  [3  +  (m  -  2)2m-1]. 

The  final  expression  2m  (3  +  (m  -  2)2m~1}  gives  the  number  of  sums  that  are 
sufficient  for  decoding  and  vector  quantizing  based  upon  Z(m). 
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Appendix  F 


Multiplication  in  Galois  Fields  GF( 2m)  * 


Michele  Elia  and  Daniele  Vellata 
Dipartimento  di  Elettronica  -  Politecnico  di  Torino 
I  -  10129  Torino  -  Italy 


Abstract 

Many  data  encrypting  and  data  encoding  techniques  operate  in 
Galois  fields  and  require  that  the  basic  arithmetical  operations  of  sum 
and  product  be  performed  as  quickly  as  possible.  Here  we  propose 
three  schemes  for  computing  products  in  GF( 2ro),  to  be  considered 
alongside  the  known  ones. 


1  Introduction 

The  decoding  of  multiple  error-correcting  cyclic  codes,  [1,2,7],  and  the  en¬ 
crypting  of  streams  of  digital  data  [4,5]  usually  operate  in  appropriate  Galois 
fields.  The  efficiency  of  the  basic  field  operations  of  sum  and  product  is  cru¬ 
cial  to  enable  the  execution  of  these  processes  without  affecting  the  overall 
system  performance.  In  particular  the  product  of  field  elements  seems  to  be 
the  most  critical  operation. 

Recently  hardware  implementations,  [8,9],  of  finite  field  multipliers  have 
been  proposed,  which  are  based  on  known  algorithms  [10,11,12].  All  these 
algorithms  use,  to  a  different  extent,  linear  .eedback  shift  registers,  [6]. 

Here  we  introduce  some  alternative  schemes  for  computing  products  in 
GF(2m)  which  we  believe  to  be  new  and  that  in  several  cases  outperform 
the  known  algorithms.  In  particular: 

•  Algorithm  I  is  the  direct  interpretation  of  the  product  definition;  at 
the  cost  of  some  storage  it  works  fast  and  with  no  limitations  as  to 
the  field  definition  or  order. 

‘This  work  was  financially  supported  in  part  by  the  United  States  Army  through  its 
European  Research  Office,  under  grant  n.  DAJA4S-86-C-0044. 
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•  Algorithm  II  is  reminescent  of  the  Fast  FourierTr  ana  f  orm  speeding 
up  principle,  needs  less  memory  th&n  the  previous  one,  but  requires  & 
more  complex  implementation. 

•  Algorithm  III  is  based  on  a  special  form  of  the  primitive  polyno¬ 
mial  defining  the  basis  element  for  GF[ 2m),  performs  very  well  but  is 
limited  to  special  values  of  m. 

One  of  the  main  concerns  in  this  kind  of  problems  is  the  balance  among 
different  resources  or  performance  requirements.  These  problems  will  be 
shortly  debated  in  the  final  section.  While  in  the  next  section  we  will  recall, 
for  sake  of  easy  reference,  some  useful  notations  and  we  will  introduce  the 
necessary  definitions. 

2  Field  element  representation 

An  element  a  of  GF( 2m)  can  be  represented  either  as  a  power  of  a  primitive 
element  rj,  that  is 

a  = 

where  Ln(a)  denotes  an  integer  number  called  logarithm  of  a  to  base  tj,  or 
as  a  polynomial  in  (i  of  degree  m  —  1,  where  /?  is  root  of  a  polynomial  g( x) 
irreducible  over  GF( 2)  of  degree  m,  that  is 

m-1 

o  =  £  <*i?  GF{  2) 

i'=0 

For  later  use  we  define  the  polynomial  a(x)  associated  to  a: 


a(z)  =  a»1*- 

»=o 


Note  that  a  =  <*(/?). 

It  is  commonly  believed  that  exponential  representations  are  better  for 
computing  products  while  polynomial  representations  are  better  for  com¬ 
puting  sums  in  GF( 2m),  [2].  Really,  the  matter  is  slightly  different,  because 
multiplication  of  numbers  in  the  exponential  representation  requires  the  ex¬ 
ecution  of  sums  of  integers  modulo  2m- 1  with  the  waste  of  time  due  to  carry 
propagation.  Moreover,  in  many  applications  the  polynomial  representation 
is  unavoidable. 
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Now  let  us  recall  how  the  product  of  two  numbers  a  and  7  in  GF( 2m)  may 
be  computed  when  polynomial  representation  is  used.  Writing 

m—  1 

6  =  a~t  =  Y  '10'<*i 

i= 0 

we  see  that  8  is  computed  by  summing  up  7/?*  whenever  a,-  =  1.  The 
addend  7/P  can  be  obtained  as  the  content  of  a  linear  feedback  shift  register, 
having  characteristic  polynomial  j(x),  starting  with  initial  content  7,  and 
performing  1  steps.  Along  the  same  line  of  the  well  known  algorithm  used 
to  evaluate  the  product  of  integers,  we  may  represent  the  products  7/3‘, 

»  =  0 . n  -  1  as  an  array  of  dots  with  the  convention  that  each  dot 

corresponds  to  a  product  0*  ■  With  abuse  of  language  we  say  that  dots  on 
the  same  column  must  be  added  modulo  2  and  finally  the  stream  of  dots  of 
length  2n  -  2  must  be  reduced  to  a  stream  of  the  n  rightmost  positions.  See 
fig-1- 

Equivalently  these  operations  may  be  described  using  the  polynomial 
representation  so  that  the  product  07  can  be  computed  by  first  executing 
the  product  a(x)7(x)  and  then  reducing  the  result  modulo  g[x) 

2m- 2 

a(x)7(x)  =  Y  «.■*'  =  £(*)  +  ?(*)P(*)>  (1) 


hence  setting  x  =  0  to  get 


«7  =  &=  Y 


where  a  =  a(0),  7  =  7( 0 )  and  6  —  8{0). 

Observe  that,  for  later  use,  the  product  a(x)7(x)  can  be  written  as: 


a(x)7(x)  =  a(x)  xm6(x). 


where 


a(*)  =  iz  Oi** 


Kx)  =  XZ 


3  Algorithms 

In  this  section  we  describe  the  algorithms  in  an  abstract  form  so  that  the 
presentation  is  not  influenced  by  present-day  technology.  Comparisons  with 
recent  implementations  will  be  given  in  the  final  section. 
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Algorithm  I.  Using  equation  (1)  and  substituting  x  =  P,  we  get 

2m— 2  2m— 2  m—  1 

^  l*=/*=  ***? •  (3) 

i=0  t=m  i=0 

In  this  expression  only  the  sum  from  m  to  2m  —  2  must  be  processed.  The 
reduction  process  can  be  very  fast  if  we  have  previously  stored  the  following 
m  —  1  powers  of  /? 

m— 1 

/?J  =  ^2  djl?  m  <  j  <  2m  -  2  ctJ  €  GF(2) 

«= 0 

which  allows  us  to  compute  6  in  a  straightforward  way 

m— 1  2m-2  m—  1 

«=E<xo'+  • 

i=0  «=m  «=0 


Algorithm  II.  To  describe  this  algorithm,  which  requires  less  stored 
data  but  is  slower  than  Algorithm  I,  let  us  consider  the  first  sum  in  equation 
(2)  and  let  us  suppose  that  the  following  power  of  f)  is  known 

Pn  =  J2  b 0  (4) 

«=o 

where  n  =  rn  -  1  +  Noting  that 

pn+3  =  £  bi?+i 

t=0 

the  (n+j)-th  power  can  be  obtained  by  shifting  j  times  to  the  left  the  se¬ 
quence  m  -  1  -  >  j  >  0. 

The  powers  of  f3  whose  exponent  is  between  n  and  2m  -  2,  do  not  need 
to  be  stored.  In  fact  using  equation  (3)  and  the  above  observations,  the 
first  sum  in  equation  (1)  can  be  reduced  to  the  sum  having  the  maximum 
power  of  fi  less  or  equal  to  n  -  1.  The  next  steps  consist  in  repeating  these 
operations  successively  on  the  powers  of  of  exponent  m  —  1  + 
m  -  1  +  jj,  ,  m  —  1  +  [l/2[...l/2[(m  -  2)/2j...J,  the  number  of 

iterations  being  [/ojjm  -  2j. 
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Algorithm  IH.  This  algorithm  is  based  on  special  forms  of  the  generat¬ 
ing  polynomial  g( x).  Here  we  consider  fields  that  have  elements  associated 
to  irreducible  trinomials  of  the  form  gt{x)  =  xm  +  xe  + 1,  where  2  <  l  <  [yj. 
The  cases  g\{x)  —  xm  +  x  4-  1  and  <?2(x)  =  xm  +  x2  +  1  will  be  considered 
separately  both  to  start  and  to  explain  the  procedure.  In  case  of  ffi(x),  we 
can  recast  equation  (2)  as  follows 

a(x)7(z)  =  a(x)  +  (1  +  x)b(x)  +  (xm  +  x  +  1  )b(x) 

so  that  substituting  x  =  f),  we  get  6  as 

«  =  «(/?)  +  (  i  +  /W) 

which  is  computed  in  two  steps  with  no  storage. 

Also  g%  (x)  presents  the  same  behaviour;  in  fact  we  have 

a(xb(x)  =  a(x)  +  6(x)  +  x2[6(x)  +  &m-2xm~2]  +  6m_2(x2  +  1)  + 

+  bm~2(xm  +  x2  +  1)  +  (xm  +  x2  +  1  )b(x)  (5) 

so  that  substituting  x  =  f3,  we  get  6  as 

6  =  a{P)  +  b[P)  +  P2[b(fl)  +  bm_7r~2}  +  *m-j(/?2  +  1)  (6) 

which  is  computed  in  two  steps  with  no  storage. 

In  general  we  have 

a(x)7(x)  =  a(x)  +  (l  +  x')6(x)  +  (xm  +  xl  +  l)6(x). 

This  equation  can  be  conveniently  rewritten  as 

m—  1 

a(x)7(x)  =  a(x)  -f  6(x)  +  (xm  +  xl  +  l)6(x)  +  E  = 

»'=o 

m-l-r 

=  a(x)  +  b(x)  +  (xm  +  xl  +  !)6(x)  +  ^  bjXl+t  -f 

«=0 

1-1  1-1 

+  (xm  +  Xf  +  1)  E  bj+tn-l**  +  {x*  +  1)  E  bj+m-tx*  (7) 
y=o  i=o 

so  that  substituting  x  =  /?,  we  get  6  as 

6  =  a(fl)  +  b{0)  +  "E  +  if  +  1)  E  (8) 

«'=o  j= o 

which  is  computed  in  two  steps  with  no  storage. 
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Alg.  I 

Alg.  II 

Alg.  SR 

Alg.  STP 

m 

GF{  2m) 

PS 

NS 

PS 

NS 

PS 

PS 

NS 

PS 

NS 

BB 

12 

3 

H 

3 

0 

2 

0 

6 

0 

■ 

Hi 

56 

3 

16 

B 

0 

2 

0 

14 

0 

8 

16 

240 

3 

48 

5 

0 

2 

0 

30 

0 

16 

Table  1:  Algorithm  comparissons 


•  PS  indicates  the  required  storage  measured  in  bits; 

•  NS  indicates  the  number  of  steps  between  input  and  output; 

•  SR  stays  for  Shift  Register; 

•  STP  stays  for  Scott- Tavares-Peppard. 


4  Conclusions 

This  paper  presents  three  schemes  for  computing  products  in  GF{ 2m).  The 
algorithms  are  not  strictly  comparable  as  far  as  they  make  use  of  different 
resources.  As  a  matter  of  fact  special  algorithms  for  performing  products 
in  finite  fields  have  been  proposed  in  the  scientific  literature.  In  particular 
the  Massey  and  Omura  multiplier  utilizes  the  normal  basis  representation 
of  the  field  elements,  while  the  Berlekamp  multiplier  uses  both  the  stan¬ 
dard  and  dual  bases  representations:  for  both  algorithms  it  is  difficult  to 
change  the  polynomial  which  generates  the  field.  The  algorithm  proposed 
by  Scott,  Tavares  and  Peppard,  which  has  been  hardware  implemented,  does 
not  present  the  previous  limits  and  can  be  compared  with  the  ones  proposed 
here. 

For  sake  of  comparison  Table  1  shows  for  the  mentioned  algorithms  the 
amount  of  required  storage  and  the  number  of  steps  between  input  and 
output. 


The  facts  emerging  from  this  table  were  confirmed  by  both  software  pro- 
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gramming  and  hardware  implementations.  From  both  programming  sern- 
plicity  and  execution  time  points  of  view,  Algorithm  III  is  undisputably 
preferable.  Its  limits  stem  from  the  fact  that  neither  primitive  irreducible 
trinomials  are  available  for  every  m,  nor  it  is  known  whether  an  infinite 
number  of  such  primitive  trinomials  does  exist. 


References 

( 1  ]  E.R.  Berlekamp,  Algebraic  Coding  Theory,  McGraw-Hill  Book  Com¬ 
pany,  New  York,  1968. 

[2]  F.J.  MacWilliams  and  N.J.A.  Sloane,  The  Theory  of  Error- 
Correcting  Codes,  Elsevier,  New  York,  1976. 

[3]  W.  Diffie  and  M.E.  Heilman,  New  directions  in  Cryptography, 
IEEE  Transactions  on  Information  Theory,  vol.IT-22,  November  1976, 
pp. 644-654. 

[4]  D.  Denning,  Cryptography  and  Data  Security,  Addison- Wesley,  Read¬ 
ing  MS,  1983. 

[5]  N.  Koblitz,  A  Course  in  Number  Theory  and  Cryptography,  Springer- 
Verlag,  New  York,  1987. 

[6]  R.J.McEliece,  Finite  Fields  for  Computer  Scientists  and  Engineers, 
Kluwer  Academic  Press,  Boston,  1987. 

[7]  M.  Elia,  Alegebraic  decoding  of  the  (23,12, 7)  Golay  Code,  IEEE  Trans¬ 
actions  on  Information  Theory,  vol.IT-33,  January  1987,  pp. 150-151. 

[8]  B.B.Zhou,  A  new  Bit-serial  Multiplier  over  GF{ 2m),  IEEE  TVansac- 
tions  on  Computers,  vol.C-37,  No.6,  June  1988,  pp.749-751. 

[9]  I.S.Hsu,  T.K.  Troung,  L.J.Deutach  and  I.S.  Reed,  A  Comparison 
of  VLSI  Architecture  of  Finite  Field  Multipliers  Using  Dual,  Normal 
or  Standard  Bases,  IEEE  Transactions  on  Computers,  vol.C-37,  No.6, 
June  1988,  pp. 735-739. 

[10]  E.R.  Berlekamp,  Bit-serial  Reed-Solomon  encoders,  IEEE  Transac¬ 
tions  on  Information  Theory,  vol.IT-28,  November  1982,  pp.869-874. 

[11]  C.C.Wang,  T.K.  Troung,  H.M.  Shao,  L.J.Deutsch,  J.K. 
Omura  and  I.S.  Reed,  VLSI  Architecture  for  computing  multipli¬ 
cations  and  inverses  in  GF(2m),  IEEE  Transactions  on  Computers, 
vol.C-34,  August  1985. 

[12]  P.A.  Scott,  S.E.  Tavares  and  L.E.  Peppard,  A  fast  multiplier  for 
GF(2m),  IEEE  Journal  on  Selected  Areas  in  Communications,  vol. SAC- 
4,  January  1986. 


8 


Fig . 1  -  General  scheme  of  multiplier 


Appendix  G 


On  the  Concatenation  of  Binary  Linear  Codes 
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Abstract 

Many  recent  applications  of  error-correcting  codes,  especially  in 
the  case  of  transmission  over  very  noisy  channels,  have  been  based 
on  concatenation  to  achieve  high  performances.  This  paper  considers 
concatenation  of  linear  block  and  convolutional  codes,  presenting  some 
considerations  on  the  bit  error  rate  computation  after  complete  decod¬ 
ing.  The  fact  that  code  concatenation  is  not  a  commutative  operation 
is  discussed. 


1  Introduction 

The  use  of  error-control  codes  is  steadily  increasing  in  a  variety  of  digital 
systems  like  digital  recording  [2],  satellite  links  [4,5,6]  and  HF  mobile  radio 
transmissions.  In  many  of  these  applications  coding  is  unavoidable  to  achieve 
high  performances  and  often  simply  to  allow  the  system  to  work. 

In  most  situations  the  symbol  error  probability  and  the  transmission  rate 
are  conflicting  targets,  and  the  application  of  efficient  and  flexible  codes 
is  necessary.  In  these  cases  the  choice  of  the  code  is  conditioned  by  two 
constraints,  namely  the  complexity  of  the  receiving  devices  and  the  decoding 
delay.  Code  concatenation  seems  to  offer  a  good  compromise  in  terms  of  the 
constraints  above. 

Furthermore,  several  applications  require  uneven  protection  of  the  infor¬ 
mation  symbols.  This  is  the  case,  for  example,  of  packetized  information 
transmissions,  where  the  protocol  information  carried  by  packets  often  re¬ 
quires  better  protection  than  the  information  part.  Unequal  error  protection 
is  a  target  easily  pursued  with  code  concatenation. 

"This  work  was  financially  supported  by  the  United  States  Army  through  its  European 
Research  Office,  under  grant  n.  DAJ  A45-86-C-0044. 
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Figure  1:  Channel  model  with  „ ode  concatenation 


It  must  be  observed  that  the  concatenation  of  codes  does  not  give  optimal 
performances  as  promised  by  Shannon’s  bounds:  in  general  concatenated 
codes  are  not  as  powerful  as  the  best  single-stage  code  with  the  same  rate. 
However  multistage  decoding  presents  a  reduced  complexity.  Moreover  in 
some  interesting  practical  cases,  concatenation  yields  performances  that  are 
not  improved  by  any  known  single  code. 

This  paper  presents  some  features  that  are  peculiar  to  code  concate¬ 
nation.  It  is  structured  as  follows.  Section  2  describes  the  model  of  code 
concatenation  assumed  in  the  papei .  Section  3  recalls  some  relevant  results 
on  the  computation  of  symbol  error  probabilities,  while  Section  4  presents 
error  probability  results  for  the  case  of  code  concatenation.  Finally,  some 
applications  of  concatenated  codes  are  described  in  Section  5. 

2  Channel  model  for  code  concatenation 

Let’s  consider  the  transmission  chain  resulting  from  the  concatenation  of 
two  codes.  The  model  of  the  system  is  shown  in  Fig.  1,  where  two  main 
parts  can  be  identified:  the  inner  and  the  outer  channel.  The  inner  channel 
is  a  discrete  channel  resulting  either  from  a  chain  of  modulator  +  physical 
channel  +  demodulator  or  from  another  embedded  coding  section.  The  inner 
channel  is  supposed  to  be  a  memoryless  binary  symmetric  channel  (BSC), 
characterized  by  an  error  probability  p.  The  code  directly  facing  the  inner 
channel  is  called  tnner  code. 

The  outer  channel  is  the  discrete  channel  resulting  by  the  chain  inner 
coder  +  inner  channel  +  inner  decoder,  considered  as  a  unit. 
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The  inner  code  may  be  either  a  convolutional  or  a  block  code,  sometimes 
jointly  used  with  modulation,  providing  either  a  hard  or  a  soft  output.  Let 
(n/w,  kiN,  djff)  denote  the  parameters  of  the  inner  code,  where  dis  is  the 
minimum  distance  in  the  case  of  block  codes  and  the  free  distance  in  the 
case  of  convolutional  codes.  Let  Rjff  =  kjpf/njff  he  the  inner  code  rate. 

The  outer  code  usually  is  a  block  code.  Let  ( nouT •  korrr-  dour)  be  the 
parameters  of  the  outer  code,  with  the  same  conventions  as  for  inner  codes. 
Let  Rout  —  kouT /nouT  be  the  outer  code  rate. 

Code  concatenation  reduces  the  net  information  rate;  the  overall  rate 
results  in 

R  =  Rip  Rout- 

The  decoding  delay  V  is  an  important  parameter  used  to  evaluate  the  per¬ 
formance  of  codes.  It  is  defined  as  the  numbe  of  symbols  passed  between 
the  instant  that  an  information  symbol  enters  the  encoder  and  the  instant 
that  the  same  symbol  comes  out  from  the  decoder.  Concatenation  usually 
increases  this  figure,  but  other  processing  operations,  like  interleaving  or 
signal  propagation,  affect  delay  even  more.  Here  we  consider  only  the  net 
delay  introduced  by  the  co-decoding  operations.  It  is  direct  to  verify  that 

V  —  mm  {/i  n//v  |  h  niff  >  nour}- 

h  integer 


3  Error  Probability  I  -  Basic  results 


In  this  section  we  recall  definitions  as  well  as  results  concerning  the  compu¬ 
tation  of  the  symbol  error  probability.  Let’s  consider  [n,  M,  d]  block  codes, 
n  being  the  dimension,  M  =  2k  the  number  of  code  vectors,  and  d  the 
minimum  distance. 

For  every  binary  block  code,  linear  or  nonlinear,  the  bit  error  rate  F’.ymb 
is  defined  as  [9] 


k  M 


P.ymb  =  Prob{*i  ?  xii  I  *i  wa*  sent} 


(1) 


»=  i  i-  i 


where  the  code,  vectors  xj  =  (zjl ,  Zj,, . . .,  Xjk)  tire  equally  likely,  A  = 
(*i,  *2, . . . ,  ife)  is  the  decoded  vector,  and  k  is  the  number  of  information  bits 
per  codeword.  In  the  case  of  linear  codes  on  a  BSC  with  error  probability 
p,  Equation  (1)  can  be  written  in  the  form 


(2) 
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where  /(e)  is  the  number  of  incorrect  information  bits  after  decoding  with 
the  assumption  that  the  all  zero  code  vector  was  transmitted,  and  the  sum¬ 
mation  is  extended  over  all  the  2n  binary  n-tuples.  wt(o)  is  the  Hamming 
weight  function:  wt(se)  is  the  number  of  nonzero  bits  in  x.  For  computa¬ 
tional  purposes  equation  (2)  can  be  rewritten  in  the  form 

n  n 

p. ymb  =  £  Bi  p'(  1  -  p)n-’  =  £  Ei  p*  (3) 

■=0  {=0 

where 

Bi=\  E  /(«>  (“) 

wt(e)=i 

and  £wt(e)=i  sums  /(e)  for  all  error  patterns  e  of  Hamming  weight  i. 

Now  let  B(X,Y)  denote  the  generating  polynomial  of  the  Bi  s,  i.e. 

n 

B(X,Y)  =  '£,Bi  A'*yn~*; 

t— o 

thus  we  can  write 

^»ymb  =  P{Pi  1  —  P)-  (5) 

Moreover  let  W(X,Y)  denote  the  weight  enumerator  polynomial,  i.e. 

n 

W(X,  F)  =  ^2  A i  X'Yn~' 

i  =  0 

where  Ai  is  the  number  of  code  vectors  whose  Hamming  weight  is  t.  It 
can  be  shown  that  B(X,  Y)  can  be  mechanically  derived  from  W(X,Y), 
by  means  of  a  linear  operator,  which  admits  an  explicit  representation  as 
antisymmetric  homogeneous  differential  operator  in  the  algebra  Z[[X,  K]], 
see  [14). 

The  closed  expression  of  symbol  error  rates  for  many  interesting  block 
codes  are  now  known  and  reported  in  the  literature.  Unfortunately  this  is 
not  true  for  convolutional  codes,  although  the  bit  error  rate  (BER)  asymp¬ 
totic  expressions  are  known  for  many  interesting  codes  of  both  kinds. 

BER  curves.  The  curves  of  the  bit  error  rate  (BER)  versus  the  error 
probability  of  the  BSC  p  show  a  threshold  phenomenon  for  most  commonly 
used  codes:  there  is  a  fast  transition  between  the  region  where  the  code 
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BER 


Figure  2:  BER  versus  raw  bit  error  probability  for  (23,12,7)  Golay  code 

reduces  the  channel  noise  and  the  region  where  the  code  is  useless.  Fig.  2 
depicts  such  a  phenomenon  in  the  case  of  the  Golay  (23,12,7)  code. 

It  must  be  noted  that  the  symbol  error  rate  on  the  BSC,  even  if  derived 
under  the  assumption  of  a  stationary  behavior  of  the  BSC  characteristics, 
gives  useful  indications  on  the  behavior  in  a  time  varying  environment:  if 
the  dynamic  in  the  time  varying  environment  is  limited  within  a  given  range, 
the  performance  is  described  by  the  image  of  this  range,  see  Fig.  3. 

Asymptotic  expressions  for  BER.  In  the  case  of  binary  block  codes 
the  asymptotic  expressions  for  the  BER  take  the  form 

P.ymb-  BPl^J+1  (6) 

where  d  is  the  minimum  distance  of  the  code  and  B  is  a  suitable  constant. 
Table  1  shows  the  asymptotic  BER  expressions  for  some  interesting  block 
codes. 

In  the  case  of  convolutional  codes,  it  has  been  shown  [10]  that  the  bit 
error  probability  can  be  written  in  the  form 

P.ymb  =  C\p(\  ~  P)]rf'/2  +  0  (p^  +  *) 


(7) 


PBSC 


Figure  3:  Bit  error  performances  for  time  varying  channels.  The  case  of 
in.  n  ( 15,5, V)  code. 


Table  1:  Values  of  asymptotic  BER  and  p„  for  some  binary  block  codes 


6 


L 

df 

BER 

p 

c 

3 

5 

10“6 

4.0  10“3 

0.988 

4 

6 

10~6 

To  io~5 

2.915 

$ 

10 

10-*- 

1.5  lF7” 

1317. 

Table  2:  Estimations  of  the  constant  C  by  simulation.  Legenda: 

L  =  constraint  length  of  the  rate  1/2  convolutional  code 
df  --  free  distance 

p  =  BSC  raw  error  probability 

P,  ymb  -  C  Pd'/2.  (8) 

where  df  is  the  free  distance  of  the  particular  code  and  C  is  a  constant  that 
is  normally  difficult  to  evaluate.  Values  of  C  estimated  by  simulation  for 
some  codes  are  reported  in  Table  2. 

Critical  error  probabilities.  The  critical  error  probability,  another  in¬ 
teresting  feature  of  binary  codes,  is  defined  as  follows. 

Definition  1  -  The  critical  error  probability  p^  for  a  binary  code  ts  defined 
as  the  minimum  error  probability  of  a  binary  symmetric  channel  (BSC)  for 
which  the  bit  error  probability  a f*er  complete  decoding  is  not  greater  than  the 
’vi"  error  probability  of  the  channel. 

It  is  immediately  apparent  that  it  is  not  convenient  to  use  the  code  whenever 
the  error  probability  on  the  BSC  is  greater  than  the  critical  error  probability: 
in  «uch  a  rase,  in  fact,  the  us»  of  the  code  leads  to  worse  error  performances 
than  no  coding  at  all.  Table  1  shows  the  critical  error  probabilities  for  some 
interesting  linear  codes. 

It  might  be  useful,  in  order  to  select  among  alternative  codes  over  a  BSC, 
to  define  the  relative  critical  error  probability  as  follows. 

Definition  2  -  The  relative  critical  error  probability  prcT  for  a  pair  of 
binary  codes  is  defined  as  the  minimum  error  probability  of  a  binary  sym¬ 
metric  channel  at  which  the  bit  error  probabilities  after  complete  decoding 
for  the  two  codes  are  equal. 

Note  that  the  critical  error  probability  may  be  also  considered  for  con¬ 
catenated  codes:  in  fact  one  of  the  advantages  deriving  from  “good”  concate¬ 
nations  is  the  increase  of  the  resulting  critical  error  probability,  maintaining 
at  the  -.an,;1  time  good  « ode  performances  for  p  below  such  limit. 
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Figure  4:  Asymmetry  of  code  concatenation 

4  Error  Probability  II  -  Concatenated  codes 

Recalling  the  code  concatenation  model  given  in  Section  2,  the  inner  code 
may  be  viewed  as  a  mechanism  that  transforms  the  error  probability  p, 
of  the  inner  channel  to  pa  =  /;(p, ),  the  bit  error  probability  of  the  outer 
channel.  The  outer  code  performs  a  similar  operation  by  transforming  p„  to 
the  error  probability  of  the  concatenated  system  p,  =  fa(Po)-  The  resulting 
transformation  is 

P.  -■  /<,(/.(?,))• 

Due  to  the  non-linearity  of  the  /,( o)  and  f„( o)  functions,  this  is  in  general 
different  from 

p.  =  hi  Up,))- 

Therefore  the  optimal  concatenation  of  two  codes,  under  the  only  constraint 
of  achieving  the  minimum  symbol  error  probability  at  a  given  rate,  depends 
in  general  on  the  order  in  which  the  two  codes  operate. 

A  sketch  of  how  the  concatenation  of  two  codes  depends  on  the  order  is 
shown  in  Fig.  4.  In  the  figure,  for  a  given  value  of  the  inner  channel  error 
probability  —  p,  =  p  =  0.13  —  and  two  particular  codes  with  similar  rates 
—  code  1  is  the  Golay  code  (23,12,7)  and  code  2  is  the  Hamming  (7,4,3) 
code  — .  it  is  shown  that  two  different  values  of  the  overall  error  probability 
p,  can  be  obtained.  In  particular,  when  code  1  is  chosen  as  the  inner  code, 
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/>„  —  p i  =0.112  and  p,  =  p J2  =  0.0806,  while,  when  code  2  is  the  inner 
code,  p,  p2  =  0.103  and  p,  =  p2i  =  0.0657. 

It  should  be  remembered  that  often  the  need  for  the  high  error  correcting 
capabilities  of  concatenated  codes  arises  at  high  channel  error  probability 
{ 1  r  5  10  2),  where  the  asymptotic  expressions  do  not  hold. 

In  many  applications,  codes  are  used  in  the  presence  of  sufficiently  small 
channel  error  probabilities,  so  that  the  polynomial  expressions  introduced 
in  the  previous  section  can  be  substituted  with  their  asymptotic  form  for 
p  tending  to  zero:  this  means  that  we  can  consider  only  the  term  of  the 
polynomials  where  p  has  the  smallest  exponent. 

Under  the  assumption  that  p  is  small  enough  for  the  asymptotic  expres¬ 
sions  to  hold,  we  may  develop  some  considerations  that  allow  the  compar¬ 
ison,  in  terms  of  the  symbol  error  probability,  of  the  two  possible  concate¬ 
nation  orders  for  a  couple  of  codes. 

Let  pj  =  dip"1  and  p2  =  A2pn 2  be  respectively  the  bit  asymptotic  error 
probabilities  of  codes  C\  and  C2.  If  the  concatenation  shows  C \  as  the  outer 
and  C2  as  the  inner  code,  we  obtain  the  asymptotic  BER 

p12  =  T,  A2 1  pn‘  "3 

while  outer  and  C\  inner  gives 

P2i  =  A ?  pn' 

In  the  particular  case  nj  =  n2  =  n,  it  is  straightforward  to  see  that  the 
best  asymptotic  performances  are  obtained  when  the  inner  code  has  a  lower 
asymptotic  error  probability,  i.e.  when  it  has  a  lower  value  for  the  coefficient 

.4. 

Table  3  shows  the  asymptotic  expressions  of  the  bit  error  rate  for  the 
two  possible  concatenation  orders  for  some  couples  of  codes  (taken  from 
Table  1). 

5  Applications 

In  this  section  we  consider  some  widely  used  code  concatenations  and  we 
evaluate  their  performances. 
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Order  of 

concatenation 

Order  of 
concatenation 

1-2 

lUUCTiVTifiai 

83835  p6 

2-1 

1-3 

1979649  p8 

3-1 

2-o 

6.39  10u  p12 

Kwnfmarm 

5-2 

3-5 

4.b8  1015  p16 

3-5 

Table  3:  Comparison  of  code  concatenation  orders 


5.1  Rate  1/2  convolutional  and  Reed-Solomon  code  chain 

One  of  the  first  proposed  code’s  chain,  see  [1],  consists  of  a  convolutional 
code  of  rate  1/2  and  constraint  length  7  with  Viterbi  decoding  as  the  inner 
code  and  a  Reed-Solomon  (RS)  code  over  GF(28)  as  the  outer  code. 

The  maximum  free  distance  for  noncatastrophic  convolutional  codes  with 
rate  1/2  and  constraint  length  7  is  10,  see  [21,  page  251].  The  asymptotic 
BER  for  this  convolutional  code  is,  from  (8),  C  ps.  The  constant  C  has  been 
estimated  by  simulation,  using  the  TOPSIM  III  simulator  [20]. 

5.2  Uneven  error  protection  -  the  LPC  case 

Several  applications  require  uneven  protection  of  the  information  to  be  trans¬ 
mitted.  This  is  the  case,  for  example,  of  packetized  data  transmissions, 
where  the  protocol  information  carried  by  packets  requires  better  protection 
than  the  information  part.  A  typical  situation  is  the  transmission  of  voice 
digitized  according  to  the  Linear  Predictive  Coding  (LPC)  [22]  approach. 

The  LPC-10  is  a  US  Government  standard  that  allows  the  transmission 
of  digitized  voice  at  2.4  Kbit/s.  The  speech  signal  is  segmented  in  contiguous 
talkspurts  of  22.5  ms,  called  frames.  Each  frame  is  coded  into  a  54  bit 
packet.  Frames  can  be  of  -  wo  kinds:  voiced  and  unvoiced.  Voiced  frames 
are  reconstructed  at  the  receiver  by  filtering  a  basic  waveform  with  a  filter 
whose  coefficients  are  estimated  by  the  transmitter  by  means  of  the  LPC 
covariance  analysis  algorithm.  These  coefficient  are  transmitted  inside  the 
54  bit  packets.  Unvoiced  frames  carry  little  information  content,  so  that 
they  are  rebuilt  as  filtered  noise,  with  a  lower  order  filter  whose  parameters 
require  less  bits  in  the  54  bits  packets;  the  remaining  bits  are  used  for  error 
protection  of  the  bit  stream.  One  bit  is  used  to  discriminate  between  voiced 
and  unvoiced  packets. 

It  should  be  clear  that  LPC  packets  contain  various  kinds  of  information 
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(filter  coefficients,  other  LPC  parameters,  voiced/unvoiced  flag,  error  pro¬ 
tection,  etc.)  whose  need  for  error  protection  varies:  it  is  very  important, 
for  example,  not  to  spoil  the  content  of  the  voiced/unvoiced  flag,  while  an 
error  in  a  low  order  bit  of  a  filter  coefficient  is  much  less  significant.  In 
this  case  it  is  desirable  to  have  a  better  error  protection  on  some  bits  of  the 
packets:  this  target  is  easily  pursued  with  code  concatenation.  The  inner 
code  could  be  applied  only  to  those  parts  requiring  more  protection,  while 
the  outer  code  could  protect  the  whole  bit  stream. 

5.3  The  Compact  Disc  audio  system 

In  the  Compact  Disc  audio  system  error  protection  is  achieved  by  the  use 
of  two  chained  Reed-Solomon  (RS)  codes  [2].  A  (32,28,5)  RS  code  is  used 
as  the  inner  code  and  a  (28,24,5)  RS  code  is  the  outer  code,  both  are  over 
GF(28);  detected  errors  in  the  inner  codes  are  erasures  for  the  outer  code. 
Th^  concatenation  order  has  been  chosen  out  of  several  constraints,  but  it 
can  be  shown  to  be  not  optimal. 

The  two  RS  codes  show  the  same  minimum  distance,  hence  the  same 
exponent  for  p  in  (6).  The  coefficient  B  in  the  same  equation  is  greater  for 
the  (32,28)  than  for  the  (28,24)  code,  since  the  former  offers  similar  error 
correcting  capabilities  on  a  larger  number  of  information  symbols.  But  in 
Section  4  it  has  been  shown  that,  under  these  conditions,  the  inner  code 
should  be  the  one  with  the  smaller  coefficient  B:  this  implies  that  the  reverse 
concatenation  order  would  lead  to  better  symbol  error  performances. 

5.4  Comparison  of  code  concatenation  with  single  stage 
codes 

As  final  example  let  us  compare  an  instance  of  concatenated  codes  with 
several  single  codes  with  comparable  rate.  It  seems  that  concatenated  codes 
outperforms  any  known  single  code  even  at  the  relatively  small  rate  of  0.38. 
The  codes  are  listed  below  and  the  results  are  summarized  in  Table  4. 

1.  Concatenation  of  inner  Hamming  (15,11,3)  code  and  outer  Golay 
(23, 12,7)  code 

2.  Concatenation  of  outer  Hamming  (15,11,3)  code  and  inner  Golay 
(23,12,7)  rode 

3.  Single  BCH  (33, 13, 10)  code 
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1 

2 

I 


1  4 


5 


4.  Single  BCH  (39, 15, 10)  code 

5.  Single  BCH  (55, 21, 15)  code 

The  asymptotic  BER  was  derived  according  to  the  approach  described 
in  [14]. 

The  decoding  complexity  may  be  hard  to  define,  due  to  the  fact  that 
an  efficient  decoding  algorithm  is  not  always  available.  Referring  to  the 
decoding  schemes  available  today,  the  codes  used  in  the  concatenations  1. 
and  2.  can  be  decoded  with  the  very  efficient  error  trapping  procedure 
devised  by  Kasami  [16],  while  for  the  other  single  stage  codes  the  known, 
[24]  and  [23],  complete  decoding  procedure,  is  direct  computation  of  the 
minimum  distance.  It  turns  out  that  the  latter  codes  are  incomparably 
more  difficult  to  decode. 

6  Conclusions 

The  use  of  error-control  codes  calls  for  a  compromise  between  efficiency 
and  complexity.  The  original  scheme  proposed  by  Forney  [1]  of  concatenat¬ 
ing  two  or  more  codes  and  modulation  provides  the  proper  answer  to  the 
problem:  it  is  now  well  accepted  that  concatenation  van  be  advantageously 
applied  to  manage  many  interesting  situations,  and  sometimes  it  can  be  the 
only  solution  with  affordable  complexity. 

The  decreasing  cost  of  digital  circuits  allows  to  foresee  that  cost-effective 
applications  of  codes  will  be  even  more  widespread  in  the  near  future.  Code 
concatenation  yields  cheaper  implementations  together  with  high  perfor¬ 
mances.  Many  authors  support  the  opinion  that  code  concatenation  is  not 
only  a  trick,  due  to  our  limited  knowledge  of  codes’  structure,  to  achieve 
good  performances;  rather  it  is  an  effective  way  to  obtain  high  performance 


Codes 

inner  outer 


Rate 


Dec. 

delay 


Asymptotic 

BER 


Per 


839,548,980  p8 


(15,11,3)  (23,12,7) 


0.383 


30 


0.137 


(23,12,7)  (15,11,3) 


0.383 


23 


171,588,966  p8 


0.124 


(33,13,10) 


0.394 


33 


35, 960  p8 


0.290 


(39,15,10) 


0.385 


39 


73,815  p8 


0.311 


0.382  55  178, 181, 640  p8  0.279 


(55,21,15) 


Table  4:  Comparisons  among  error-control  schemes 
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with  limited  complexity.  This  opinion  is  upheld  by  the  proof,  see  [25],  that 
the  general  decoding  problem,  also  for  linear  codes,  is  iVP-complete. 
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